A Class of Generalized Approximate Inverse Solvers for Unsymmetric Linear Systems of Irregular Structure Based on Adaptive Algorithmic Modelling for Solving Complex Computational Problems in Three Space Dimensions

A class of general inverse matrix techniques based on adaptive algorithmic modelling methodologies is derived yielding iterative methods for solving unsymmetric linear systems of irregular structure arising in complex computational problems in three space dimensions. The proposed class of approximate inverse is chosen as the basis to yield systems on which classic and preconditioned iterative methods are explicitly applied. Optimized versions of the proposed approximate inverse are presented using special storage (k-sweep) techniques leading to economical forms of the approximate inverses. Application of the adaptive algorithmic methodologies on a characteristic nonlinear boundary value problem is discussed and numerical results are given.


Introduction
In recent years, extensive research work has been focused on the computation of exact and approximate inverse matrices for solving efficiently complex computational problems particularly on parallel computer systems [1]- [20].In this article, a new class of Sparse Approximate Inverses matrices based on adaptive algorithmic modelling methods is presented.These adaptive algorithmic solution methods can be used for solving large sparse linear finite difference (FD) and finite element (FE) systems of irregular structures derived mainly from the discretization of parabolic and elliptic PDE's in both two and three space dimensions [21]- [31].
Let us consider a class of boundary value problems defined by the equation , subject to the general boundary conditions where D is a closed bounded domain in R N , with N ≤ 3 and ∂D the boundary of D, p ε a predetermined singular perturbation parameter, where the coefficient matrix A is a large sparse real (n × n) matrix of irregular structure.The structure of A is shown in the following Figure 1.For solving the system (1.2), there is a choice between direct and iterative, assuming that there are no barriers due to memory requirements for the former or excessive runtimes (e.g.time dependent problems) for the latter.Note that for generality purposes the coefficient matrix is assumed to be unsymmetric (case occurring in the discretization of flow equations that arise in certain Hydrology studies [32] and of irregular non-zero structure (case resulting from the triangulation of irregular or regular domains into irregular elements in pipeline networks [33].Algorithmic solution methods for the linear systems (1.2) applicable to both two and three space dimensions can be applied [22], where the unsymmetric coefficient matrix, in which all the off-centre terms are grouped into regular bands, can be factorized exactly to yield direct algorithmic procedures for the FD or FE solution.
It should be noted that in the case of very large sparse linear and nonlinear systems with coefficients of irregular structure, the memory requirements and the corresponding computational work are prohibitively high and the use of exact inverse solvers is usually not recommended.In such cases, preconditioned iterative techniques for solving numerically the FD or FE linear systems (1.2) can be used by deriving semi-direct solution methods following the principle [27] [34] that implicit procedures based on approximately decomposing discrete operators into easily invertible factors facilitating the solution of (1.2).Sparse factorization procedures yield efficient procedures for the FE or FD solution by manipulating the problem of the fill-in terms, which occur during the factorization [22] [35].Note that simple compact storage schemes for the considered data can be used and preconditioned algorithmic solvers do not require any searching operations.An important feature of the proposed adaptive algorithmic methods is the provision of a class of iterative methods for solving large sparse unsymmetric systems of irregular non-zero structure, with additional computational facilities, i.e. the choice of fill-in parameters, rejection parameters, entropy-adaptivity-uncertainty (EAU) parameters [36], by which the best method for a given problem can be selected.The proposed methods have a universal scope of application for numerically solving of elliptic and parabolic boundary value problems by either FD or FE discretization methods in both two and three space dimensions with the only restriction being that the coefficient matrix should be diagonally dominant.

Approximate LU Decomposition and Approximate Inverse Methods
The approximate factorization techniques and approximate inverse methodologies have been widely used for solving a large class of linear and nonlinear systems resulting in complex computational problems [12] [29] [31] [37]- [49].The LU factorization of a given matrix is characterized as a high level algebraic description of Gaussian elimination and by expressing the outcomes of matrix algorithms in the language of matrix factorizations facilitates generalization and certain connections between algorithms that may appear different at scalar level [47].The solution of linear system can be computed by a two-step triangular solving process, i.e.

Ly s Ux y Ax LUx Ly s
For solving the symmetric problem Ax s = , a variant of the LU factorization in which A is decomposed into three-matrix product, i.e.In the general case, such as the case of three space dimensions and the finite element discretization, the coefficient matrix has an irregular structure of the nonzero elements, where the non-diagonal elements can be grouped in regular bands of width 1 l and 2 l (width parameters) in distances m and p (semi-bandwidths) re- spectively.
The linear system (1.2) can be solved by direct (explicitly) or iterative (implicitly) methods depending on the availability of memory requirements.Several factorization/decomposition techniques can be used for facilitating the numerical solution of linear system (1.2), i.e. two, three, four term factorization schemes of the coefficient matrix A. Following an explicit solution of system (1.2) this system can equivalently written as where M is the inverse matrix of A, i.e.
Since the computation of the exact inverse is a difficult computational problem particularly in the case of complex 3D problems, the approximate inverse matrix approach can be alternatively used.
Let us consider an approximate factorization of the coefficient matrix A, where S L and S U , are lower and upper sparse triangular matrices of irregular structures of semi-bandwidths m and p retaining 1 r and 2 r fill-in terms respectively.The decomposition factors S L and S U are banded matrices with l 1 and l 2 the numbers of diagonals retained in semi-bandwidths m and p respectively (Figure 2 and Figure 3), of the following form.
The computation of the elements of the sparse decomposition factors has been presented in [23] [24] [27].An analogue scheme can be obtained for matrix W, while the relationships of the elements of H matrix and the corresponding conventional (for 1 5 n m − + = and 1 1 6 l r l + − = ) is shown in the following Figure 5 (the same holds for the matrix F).
The (near) optimum values of fill-in parameters are mainly depended on the nature of the problem and structure of the coefficient matrix A [22] [47]- [49].

Introduction
An exact inverse algorithm based on adaptive algorithmic methodologies for solving linear unsymmetric systems of irregular structure arising in FD/FE discretization of boundary-value problems in three space dimensions has been recently presented [36].This algorithm computes the elements of an exact inverse of a given unsymmetric matrix of irregular structure using an exact LU factorization [22].The computational work of the EBAIM-1 algorithm is ( )( ) O n l l m p l l  multiplications, while the memory requirements are (n × n) words.In the case of very large systems the memory requirements could be prohibitively high and the usage of approximate inverse iterative techniques is desirable.
It should be also noted that in the case that only 1 1 r m < − and 2 1 r p < − fill-in terms are retained in semi-bandwidths m and p respectively, then a class of approximate inverse matrix algorithms for solving large sparse unsymmetric linear systems of irregular structure [50] arising in the FD/FE discretization of elliptic and  A class of optimized approximate inverse variants can be obtained by considering a (near) optimized choice of the approximate inverse M depends on the selection of related parameters, i.e. fill-in parameters r 1 , r 2 , retention parameters δl 1 , δl 2 and entropy-adaptivity-uncertainty (EAU) parameters [36] [51].Note that the selection of retention parameter values as multiples of the corresponding semi-bandwidths of the original matrix leads to improved numerical results [22].Then, the following sub-classes of approximate inverses, depending on the accuracy, storage and computational work requirements, can be derived as indicated in the following Figure 6.where and column during the computational procedure of the approximate inverse and under certain hypotheses can be considered as a good approximation of the original inverse, while the entries of the approximate inverse in sub-class III have been retained after computing M * ( ) 1 and 1 r m r p < − < − and are less accurate than the corresponding entries of

Approximate Inverse Algorithmic Methodologies
Algorithmic solution methods for the linear systems (1.2) applicable to both two and three space dimension can be applied [23], where the unsymmetric coefficient matrix, in which all the off-centre terms are grouped into regular bands, can be factorized exactly to yield direct algorithmic procedures for the FD or FE solution.Alternatively, preconditioned iterative techniques for solving numerically the FD or FE linear systems (1.2) can be used by deriving semi-direct solution methods following the principle [28] that implicit procedures based on approximately decomposing discrete operators into easily invertible factors facilitating the solution of (1.2).Sparse factorization procedures yield efficient procedures for the FE or FD solution by manipulating the problem of the fill-in terms, which occur during the factorization.Note that simple compact storage schemes for the considered data can be used and preconditioned algorithmic solvers do not require any searching operations.An important feature of the proposed adaptive algorithmic methods is the provision of both direct and iterative methods for solving large sparse unsymmetric systems of irregular non-zero structure, with additional computational facilities, i.e. the choice of fill-in parameters, rejection parameters, entropy-adaptivity-uncertainty (EAU) parameters [36] [51], by which the best method for a given problem can be selected.The proposed methods have a universal scope of application for numerically solving of elliptic and parabolic boundary value problems by either FD or FE discretization methods in both two and three space dimensions with the only restriction being that the coefficient matrix be diagonally dominant.
Let us assume that , where M is the exact inverse of A. The elements of M can be determined by solving recursively the systems where 1 2 , r r are the fill-in parameters, i.e. the number of outermost off-diagonal entries retained in semi-band- widths m and p respectively.
Then, by post-multiplying Equation (3.2) by ( ) − , where r is the fill-in parameter, i.e. the number of outer-most off diagonal entries retained in semi-bandwidth of the tridiagonal factor L r. , by considering the equations in the analytical form for i-row with 1 j r = + and the j-column with where , i j δ is the Kronecker delta [21] [27] [52].The elements of the approximate inverse for i n = can be determined successively as µ .In the following Figure 7 the form of an (8 × 8)  approximate inverse matrix is indicatively demonstrated.where

Optimized Approximate Inverse Matrices and Storage Techniques
Let us consider the exact inverse M of the original coefficient matrix A in equation (2.1).Note that the computation of the inverse is indicated in the following characteristic diagram (Figure 8).
It should be noted that the diagonal elements (in bold) are firstly computed (starting from the last element of the inverse, i.e. , n n µ ) and then computing upwards/column-wise and from right to left/row-wise.Note also that  µ , all the elements of n-row (from right to left) and n-column (upwards).Then, only the diagonal elements 1, 1 µ − − − − is computed and all the elements of (n-k-2)-row (from right to left) and the elements of (n-k-2)-column (upwards).Continuing in this way the rest elements of this approximate inverse are computed.It should be noted that the computational work of the resulting inverse M * of this sub-class by using the KS-storage technique is considerably smaller than that required by the approximate inverse resulting from the application of the DS storage technique.In the case of k = 2 the KS-storage technique reduces to the example shown in Figure 4.

An optimized explicit banded approximate inverse by minimizing the memory requirements of EBAIM-1 algorithm
In order to minimize the memory requirements of EBAIM-1 algorithm, which in particular in the case of very large matrices of irregular structure can be prohibitively high, we consider the inverse M of equation (5.4) revolving its elements by 180˚ about the anti-diagonal removing the diagonal and the (δl-1) super diagonals in the first δl columns, while the rest δl sub-diagonals in the rest δl columns, then results the following form of the inverse (Figure 9).

The Optimized Approximate Inverse Algorithm
The application of this storage scheme on the approximate inverse leads to the following optimized approximate inverse algorithm.Note that the computation of the approximate inverse algorithm pre-assumes the approximate factorization of the coefficient matrix A, i.e.  ( ) , , , 1 , , call mw n l i j mr k x y δ + + step 23: , , , 2 , , call mw n l i j pr k x y δ + + step 24: ( ) , , , 1 , , call mw n l i j mr k x y δ + + step 26: ( ) , , , 1 , , call mw n l i j mr k x y δ + + step 29: , , , 2 , , call mw n l i j pr k x y δ + + step 30: , , , 1 , , call mw n l i j mr k x y δ + + step 32: ( ) , , , 2 , , call mw n l i j pr k x y 1 , , , 1, , call mw n l l p k x y δ + − step 37: , , , 1, , call mw n l l m k x y δ + − step 38: ( ) ( ) , , , 1, , call mw n l i p k x y δ + − step 46: ( ) ( ) , , , 1, , call mw n l i p k x y δ + − step 56: , , , 1, , call mw n l i p k x y δ + − step 61: for ( ) ( ) The subroutine mw (n, δl, s, q, x, y) performs the transformation in the indexes of the explicit approximate inverse matrix from its banded form to the optimized form.This routine has the following form: Subroutine mw (n, δl, s, q, x, y) The computational work of the optimized OBAIM-1 algorithm is ( )  multiplica- tions, while the memory requirements have been reduced down to ( ) It should be also noted that a class of approximate inverse matrix can be considered containing several sub-classes of approximate inverses according to memory requirements, computational work, accuracy, as indicated in the diagrammatic schemes (Figure 3 and Figure 7).

Explicit Adaptive Iterative Methods
A class of Adaptive Iterative Schemes for solving large sparse linear systems includes the following adaptive preconditioned iterative methods: * 1 , 0 where i i r s Au = − , α and β are predetermined acceleration parameters, i α and i β are sequences of precondi- tioned acceleration parameters and 1 1 -

The Explicit Preconditioned Iterative Method
During the last decades extensive research work has been focused in the preconditioned approach and preconditioned iterative methods for solving large linear and nonlinear problems in sequential and parallel environments [5] [58].A predominant role in the usage of the preconditioned iterative schemes possess the explicit preconditioned Conjugate Gradient (EPCG) method and its variants using the sparse approximate inverse M * due to its superior convergence rate for solving very large complex computational problems [47].A characteristic explicit solver of this sub-class is the Explicit Preconditioned Generalized Conjugate Gradient (EPGCG) method [58].This basic EPCG method can be expressed in the following compact form: Algorithm EPCG-1 (A, n, s, u 0 , u, r) Purpose: This algorithm computes the solution vector of the linear system Au s = by using the explicit preconditioned generalized Conjugate Gradient method.
Input: A given matrix, n order of A, s known rhs vector, u 0 initial guess Output: solution vector u, residual r Computational Procedure: Step 1: let 0 u be an arbitrary initial approximation to the solution u  u and corresponding residual 0 r .Note that a good approximant M * leads obviously to an improved EPCG method.The effectiveness of the explicit preconditioned iterative methods for solving certain classes of elliptic boundary value problems on regular domains is related to the fact that the exact inverse of A (although is full) exhibits a similar fuzzy structure around the principal diagonal and m-diagonals [22].

The Symmetric Case
In the case of symmetric coefficient matrix by using the four-matrix decomposition [27] [41] the corresponding inverse subclasses can be enlarged as follows in Figure 10.where 1) the elements of exact inverse of subclass I are obtained after the exact decomposition ( ) 1, 1 r m r p = − = − of M + , with excessive memory and computational requirements, 2) the elements of the inverse 1, 1 r m r p = − = − , while only 1 l δ and 2 l δ diagonals have been retained, 3) the elements of inverse M S2 of subclass III have been computed from the approximate inverse, while the exact decomposition ( ) 1, 1 r m r p = − = − has been applied, 4) the elements of the inverse of subclass IV have been computed from the approximate factorization and the banded approximate inverse algorithm has been used for computing the elements of the inverse ( ) ) the elements of inverse of subclass V have been retained only on the diagonal elements of the inverse, i.e. δ is recommended to be multiples of values of m and p, leading to preconditioners with better performance [22].
An indication of the sparsity and memory requirements of optimized versions of approximate inverses is given in the following Table 1.
It should be noted that in the case that δl 2 = 0 the approximate inverse algorithm is reduced to an algorithm for solving FE systems in two space dimensions of semi-bandwidth m, while if 1   where ∂R is the exterior boundary of the domain R. Equation (6.1) arises in magnetohydrodynamics (diffusion-reaction, vortex problems, electric space charge considerations) with its existence and uniqueness assured by the classical theory [32] [33].The solution of Equation (6.1) can be obtained by the linearized Picard and quasi-linearized Newton iterative schemes as outer iterative schemes of the form: , , , where δ denotes here the usual central difference operator.The resulting large sparse nonlinear system is of the form where Ω is a block tridiagonal matrix [23].
Then, composite iterative schemes can be used, where Picard/Newton iterations are the outer iteration, while the inner iteration can be carried out either directly by an exact algorithm or by an approximate algorithm in conjunction with an explicit iterative method (6.3).The latter method can be written as where the superscript l denotes the outer iteration index, the subscript I denotes the inner iteration and - The outer iteration was terminated when the following criterion was satisfied while the termination criterion of the inner iteration was ( ) where ε 1 was taken initially as x y = = = and the initial guesses when u was on the boundary 0.0, 5.0, 10.0 were chosen as 0.0, 4.0, 6.0 respectively.The performance of the composite schemes Newton-Explicit Preconditioned Simultaneous Displacement (EPSD) and Picard/Newton-EPCG are given in the following Table 2 and Table 3.

Conclusions
A class of exact and approximate inverse adaptive algorithmic procedures has been presented for solving numerically initial/boundary value problems.Several subclasses of optimized variants of these algorithms have been also proposed for solving economically highly nonlinear systems of irregular structure.It should be stated that the proposed explicit preconditioned iterative methods and their related variants can be efficiently used for solving large sparse nonlinear systems of irregular structure of complex computational problems and for the numerical solution of highly nonlinear initial/ boundary value problems in two and three space dimensions.
Table 2.The performance of composite iterative schemes Picard/Newton for the nonlinear system (6.4) using the EPSD method (r = 4), with n = 361, m = 20, for several values of acceleration parameter α and retention parameters δl.Table 3.The performance of composite iterative schemes Picard/Newton for the nonlinear system (6.4) using the EPCG method, with n = 361, m = 20, for several values of retention parameters δl and fill-in parameters r.

Overall iterations
sufficiently smooth functions on D and ζ ∂ is the direction of the outward normal derivative.The discrete analogue of Equation (1.1) leads to the solution of the general linear system Au s = , (1.2)

Figure 1 .
Figure 1.The structure of coefficient matrix A.
D is diagonal and L, M are lower unit lower triangular.In this case the solution can be obtained in then L = M and the computational work for the factorization is half of that required by Gaussian elimination.The factorization takes the form T A LDL = and can be used for solving symmetric problems, as well as the four-matrix product, i.e.T A DTT D = , where T T is the transpose of T (Varga, 1962).In the case of symmetric positive definite systems the factorization T A LDL = exists and is computationally stable.The factorization T A LL = is known as Cholesky factorization and by solving the triangular system Ly s = and T the Gaxpy Cholesky version requires 3 3 n flops, where Gaxpy is a BLAS level-2 routine defined algebraically as z Ax y = + requiring ( ) O mn operations.

Figure 2 .
Figure 2. The structure of lower decomposition factor s L .

Figure 3 .
Figure 3.The structure of upper decomposition factor s U .The relationships of the elements of V matrix and the corresponding conventional (for 1 5 n m − + = and

Figure 4 .
Figure 4.The relationship of matrix V in its banded stored form and the corres-ponding one in Figure 1.

Figure 5 .
Figure 5.The relationship of matrix H in its stored form and the corresponding one in Figure 3. parabolic boundary values, can be obtained.Such algorithms are described in the next sections.A class of optimized approximate inverse variants can be obtained by considering a (near) optimized choice of the approximate inverse M depends on the selection of related parameters, i.e. fill-in parameters r 1 , r 2 , retention parameters δl 1 , δl 2 and entropy-adaptivity-uncertainty (EAU) parameters[36] [51].Note that the selection of retention parameter values as multiples of the corresponding semi-bandwidths of the original matrix leads to improved numerical results[22].Then, the following sub-classes of approximate inverses, depending on the accuracy, storage and computational work requirements, can be derived as indicated in the following Figure6.where sub-class I is a banded form of the exact inverse retaining 1 2 , l l δ δ elements along each row and column respectively, while its elements are equal to the corresponding elements of the exact inverse.sub-class II is a banded form of M, retaining only 1 2 , l l δ δ elements along each row Finally, in sub-class IV the elements of the approximate inverse can be computed.

M
, a non-singular ( ) n n × matrix, is an approximate inverse of A, i.e. non-zero elements have been retained in the corresponding decomposition factors, in the 2D symmetric case, i.e.
is a (8 × 8) banded approximate inverse matrix with retention parameters 1

Figure 8 .
Figure 8. Computing the elements of the approximate inverse M * of sub-class IV following the KS technique (K = 2) * .( * ) Note that this computational technique will be referred as double-sweep (DS) technique.
≈, where s L and s U are the lower and upper

Figure 9 .δ and 2 lδ
Figure 9. Transformed forms of optimized approximate inverse M * (in banded storage).sparse triangular decomposition factors [21] [23]-[25] [49].Algorithm OBAIM-1 (a, b, c, n, F, H, g, Γ, Ζ, ω, β, r 1 , r 2 , m, p, l 1 , l 2 , δl, M * ) Purpose: This algorithm computes the elements of the approximate inverse of a given real (n × n) matrix of irregular structure Input: diagonal elements a of matrix A; superdiagonal elements b, subdiagonal elements c, n order of A; submatrices F, H, of upper triadiagonal decomposition factor U, superdiagonal elements g of L; submatrices Γ, Z, of lower tridiagonal matrix L; diagonal elements ω of L; subdiagonal elements β of L; fill-in parameters r 1 , r 2 ; semi-bandwidths m, p; l 1 and l 2 numbers of diagonals retained in semi-bandwidths m and p respectively, 1 l δ and have been computed after the application of the exact inverse algorithm to a fast algorithm for computing of approximate inverse.Note that the largest elements of inverse matrix are mainly gathered around the main diagonal in distances * m p m and * p p p in a recurring wave like pattern(Lipitakis, 1984), where m and p are respectively the semi-bandwidths of the coefficient matrix A, and reduces to one for solving FD linear systems in three space dimensions of semi-bandwidths m and p[23].In the case of δl 1 = 1 and δl 2 = 0, then the approximate inverse reduces to one for solving linear FD systems in two space dimensions of semi-bandwidth m[25], while if the approximate inverse reduces to the one for solving tridiagonal linear systems (Thomas algorithm)[59].

Figure 10 .
Figure 10.Subclasses of inverses for the symmetric case.

=
and then was decreased at each iterative step by 1/10 to 10 −6 , where it remained constant during the next iterative steps.Numerical experiments were carried out for nonlinear problem (6.4) with max max 1 20 , 1 h diagonals in the lower and upper triangular parts of inverse respectively, the remaining elements being just not computed at all.Optimized forms of this algorithm are particularly effective for solving banded sparse FE systems of very large order, i.e. [ ] l δ ,

Table 1 .
Memory and sparsity requirements of the approximate inverse matrix (n = 8000, m = 21, p = 401), where δl denotes here the retention parameters 1