^{1}

^{*}

^{1}

^{*}

^{1}

^{*}

This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering algorithm like METIS and putting columns/rows for which the Schur complement is needed into the top node of the elimination tree. Any problem with a degenerate part of the initial matrix can be resolved with the help of iterative refinement. The proposed approach is close to the “multifrontal” one which was implemented by Ian Duff and others in 1980s. Schur complement computations described in this paper are available in Intel
^{®} Math Kernel Library (Intel
^{®} MKL). In this paper we present the algorithm for Schur complement computations, experiments that demonstrate a negligible increase in the number of elements in the factored matrix, and comparison with existing alternatives.

According to F. Zhang [

Partial solving of systems of linear equations plays an important role in linear algebra for implementation of efficient preconditioners based on domain decomposition algorithms. Partial solutions usually involve sparse matrices. For this reason Schur complement computations and partial solving have been implemented in Intel^{®} Math Kernel Library (Intel^{®} MKL) [

There are a number of papers that focused on efficient implementation of the Schur complement. As example, Aleksandrov and Samuel [

Intel^{®} MKL PARDISO [

The proposed implementation of the Schur complement continues the work of the authors in the area of multifrontal direct sparse solvers. In Kalinkin [

Let A be a symmetric positive definite sparse matrix (the symmetry and positive definiteness of the matrix is set in order to simplify the algorithm description avoiding the case of degenerate matrix minors):

where

where

The matrix

1) Calculate decomposition of |
---|

2) Calculate |

3) Calculate |

4) Calculate |

This algorithm has several significant disadvantages that can form barriers for its implementation for large sparse systems. The main disadvantage is in the step 2 of Algorithm 1 involving the conversion of sparse matrix B^{T} into a dense matrix, which requires allocating a lot of memory for storing temporary data. Also, if we consider B^{T} as a dense matrix a large number of zero elements are processed in multiplication ^{T}, which would make this step one of the most computational intensive parts of the algorithm and would significantly increase the overall computational time. To prevent this, we propose the following algorithm based on the multifrontal approach which calculates the Schur complement matrix first, and then the factorization of the matrix A without significant memory requirements for the computations to proceed.

As in the papers [_{loc} as in the left of

Let us append the original matrix A_{loc} stored in the sparse format with zeroes so that its nonzero pattern matches completely that of the matrix L. The elements of L in row 3 can be computed only after the elements in rows 1 and 2 are computed; similarly, element in row 6 can be computed only after elements in rows 4 and 5 are computed. The elements in the 7th row can be computed last. This allows us to construct the dependency tree [

Such a representation allows us to modify Algorithm 1 using the following notation: node Z_{j} is a child of Z_{i} if Z_{j} resides lower than Z_{i} in the dependency tree (_{j} to Z_{i}.

where by mask_{i}Z_{j} we denote a submatrix built as intersection of columns corresponding to node Z_{i} with rows corresponding to node Z_{j}. In terms of representation in the right of the

To calculate the Schur complement let us add to the representation in the columns and rows of matrices B, B^{T}, and C to achieve full representation of matrix A as in left part of the

(right). Note that blocks corresponding to the columns and rows of matrices B^{T}, B, and C are sparse. After factorization of the full matrix A the number of nonzero elements there increases significantly, but our experiments show that the blocks remain sparse and do not become dense.

Let’s introduce the following notation: _{i} node of the tree expanded by the corresponding rows of the matrix B^{T}, Z_{C} is a node of the tree corresponding to the matrix C. Then we can modify Algorithm 2 to take into account the elements of matrices B, B^{T}, and C.

_{loc}, though it stays sparse and overall the number of nonzero elements increases slightly. For this test, we see that the number of nonzero elements is only five percent higher in the case when we calculate the Schur complement (Algorithm 3) compared to the case without Schur complement calculations (straight factorization).

In

For

For

row on average, while the sparsity of Flan_1565 is about 70 nonzero elements per row and the sparsity of Geo_1438 is more than 400 nonzero elements per row. In both cases the time for Schur complement computations is almost the same when the number of threads is small for the Intel MKL and MUMPS, but the time needed for Intel MKL PARDISO solver significantly decreases when the number of threads increases. Moreover, comparison of

We demonstrated an approach that calculates the Schur complement for a sparse matrix implemented in Intel Math Kernel Library using the Intel MKL PARDISO interface. This implementation allows one to use a Schur complement for sparse matrices appearing in various mathematical applications, from statistical analysis to algebraic solvers. The proposed approach shows good scalability in terms of computational time and better performance than similar approaches proposed elsewhere.