_{1}

^{*}

In this paper we present a new subspace iteration for calculating eigenvalues of symmetric matrices. The method is designed to compute a cluster of k exterior eigenvalues. For example, k eigenvalues with the largest absolute values, the k algebraically largest eigenvalues, or the k algebraically smallest eigenvalues. The new iteration applies a Restarted Krylov method to collect information on the desired cluster. It is shown that the estimated eigenvalues proceed monotonically toward their limits. Another innovation regards the choice of starting points for the Krylov subspaces, which leads to fast rate of convergence. Numerical experiments illustrate the viability of the proposed ideas.

In this paper we present a new subspace iteration for calculating a cluster of k exterior eigenvalues of a given symmetric matrix,

The new iteration applies a Krylov subspace method to collect information on the desired cluster. Yet it has an additional flavor: it uses an interlacing theorem to improve the current estimates of the eigenvalues. This enables the method to gain speed and accuracy.

If G happens to be a singular matrix, then it has zero eigenvalues, and any orthonormal basis of Null (G) gives the corresponding eigenvectors. However, in many practical problems we are interested only in non-zero eigenvalues. For this reason the coming definitions of the term “a cluster of k exterior eigenvalues” do not include zero eigenvalues. Let r denote the rank of G and assume that k < r. Then G has r non-zero eigenvalues that can be ordered to satisfy

or

The new algorithm is built to compute one of the following four types of target clusters that contain k extreme eigenvalues.

A dominant cluster

A right-side cluster

A left-side cluster

A two-side cluster is a union of a right-side cluster and a left-side cluster. For example,

Note that although the above definitions refer to clusters of eigenvalues, the algorithm is carried out by computing the corresponding k eigenvectors of G. The subspace that is spanned by these eigenvectors is called the target space. The restriction of the target cluster to include only non-zero eigenvalues means that the target space is contained in Range(G). For this reason the search for the target space is restricted to Range(G).

Let us turn now to describe the basic iteration of the new method. The qth iteration, _{q} has

(Typical values for are or.)

Step 1: Eigenvalues extraction. First compute the Rayleigh quotient matrix

Then compute k eigenpairs of S_{q} which correspond to the target cluster. (For example, if it is desired to compute a right-side cluster of G, then compute a right-side cluster of S_{q}.) The corresponding k eigenvectors of S_{q} are assembled into a matrix

which is used to compute the related matrix of Ritz vectors,

Step 2: Collecting new information. Compute a matrix _{q} are forced to stay in Range(G).

Step 3: Discard redundant information. Orthogonalize the columns of B_{q} against the columns of_{q}, satisfies the Gram-Schmidt formula

Step 4: Build an orthonormal basis. Compute a matrix,

whose columns form an orthonormal basis of_{q} (if

Step 5: Define

which ensures that

The above description is aimed to clarify the purpose of each step. Yet there might be better ways to carry out the basic iteration. The restriction of the search to Range(G) is important when handling low-rank matrices. However, if G is known to be a non-singular matrix, then there is no need to impose this restriction.

The plan of the paper is as follows. The interlacing theorems that support the new method are given in the next section. Let_{q}. Roughly speaking, the better information we get, the faster the convergence is. Indeed, the heart of the algorithm is the computation of B_{q}. It is well-known that a Krylov subspace which is generated by G gives valuable information on peripheral eigenvalues of G, e.g., [_{q}, see Section 3. Difficulties that arise in the computation of non-peripheral clusters are discussed in Section 4. The fifth section considers the use of acceleration techniques; most of them are borrowed from orthogonal iterations. Another related iteration is the Restarted Lanczos method. The links with these methods are discussed in Sections 6 and 7. The paper ends with numerical experiments that illustrate the behavior of the proposed method.

In this section we establish a useful property of the proposed method. We start with two well-known interlacing theorems, e.g., [

Theorem 1 (Cauchy interlace theorem) Let

Let the symmetric matrix

denote the eigenvalues of H. Then

and

In particular, for

Corollary 2 (Poincaré separation theorem) Let the matrix

The next theorem seems to be new. It sharpens the above results by removing zero eigenvalues.

Theorem 3 Assume that the non-zero eigenvalues of G satisfy (1.2) where

Let the matrix

and

Proof. Let the matrix

Let us return now to consider the qth iteration of the new method,

and let the eigenvalues of the matrix

be denoted as

Then the Ritz values which are computed at Step 1 are

and these values are the eigenvalues of the matrix

Similarly,

are the eigenvalues of the matrix

Therefore, since the columns of

On the other hand from Theorem 3 we obtain that

Hence by combining these relations we see that

for

Assume now that the algorithm is aimed at computing a cluster of k left-side eigenvalues of G,

Then similar arguments show that

for

Recall that a two-sides cluster is the union of a right-side cluster and a left-side one. In this case the eigenvalues of S_{q} that correspond to the right-side satisfy (2.9) while eigenvalues of S_{q} that correspond to the left-side satisfy (2.10). A similar situation occurs in the computation of a dominant cluster, since a dominant cluster is either a right-side cluster, a left-side cluster, or a two-sides cluster.

It is left to explain how the information matrices are computed. The first question to answer is how to define the starting matrix X_{1}. For this purpose we consider a Krylov subspace that is generated by the vectors

where _{1} is defined to be a matrix whose columns provide an orthonormal basis for that space. This definition ensures that _{1} can be done in a number of ways.

The main question is how to define the information matrix

which is needed in Step 2. Following the Krylov subspace approach, the columns of

where

The ability of a Krylov subspace to approximate a dominant subspace is characterized by the Kaniel-Paige- Saad (K-P-S) bounds. See, for example, ([

where vector of ones.

Another consequence of the K-P-S bounds is that a larger Krylov subspace gives better approximations. This suggests that using (3.2)-(3.3) with a larger

A different argument that supports the last observation comes from the interlacing theorems: Consider the use of (3.2)-(3.3) with two values of

A cluster of eigenvalues is called peripheral if there exists a real number,

A second difficulty comes from the following phenomenon. To simplify the discussion we concentrate on a left-side cluster of a positive semi-definite matrix that has several zero eigenvalues. In this case the target cluster is composed from the k smallest non-zero eigenvalues of G. Then, once the columns of _{q} and Z_{q} turn out to be ill-conditioned. This makes

One way to overcome the last difficulty is to force

Step 3^{*}: As before, the step starts by orthogonalizing the columns of

A second possible remedy is to force

Step 5^{*}: Compute the matrices

whose columns form an orthonormal basis of

In the experiments of Section 8, we have used Step 3^{*} to compute a left-side cluster of Problem B, and Step 5^{*} was used to compute a left-side cluster of Problem C. However the above modifications are not always helpful, and there might be better ways to correct the algorithm.

In this section we outline some possible ways to accelerate the rate of convergence. The acceleration is carried out in Step 2 of the basic iteration, by providing a “better” information matrix,

In this approach the columns of

where

The shift operation is carried out by replacing (5.1) with

where

Assume first that G is a positive definite matrix and that we want to compute a left-side cluster (a cluster of the smallest eigenvalues). Then (5.1) is replaced by (5.2), where

A similar tactic is used for calculating a two-sides cluster. In this case the shift is computed by the rule

In other words, the shift estimates the average value of the largest and the smallest (algebraically) eigenvalues of G. A more sophisticated way to implement the above ideas is outlined below.

Let the eigenvalues of G satisfy (2.1) and let the real numbers

define the monic polynomial

Then the eigenvalues of the matrix polynomial

are determined by the relations

Moreover, the matrices G and

The idea here is to choose the points

As with orthogonal iterations, the use of Chebyshev polynomials enables effective implementation of this idea. In this method there is no need in the numbers

This approach is possible only in certain cases, when G is invertible and the matrix-vector product

In practice

The use of (5.4) is helpful for calculating small eigenvalues of a positive definite matrix. If other clusters are needed then

In this section we briefly examine the similarity and the difference between the new method and the Orthogonal Iterations method. The last method is also called Subspace Iterations and Simultaneous Iterations, e.g., [

where

Step 1: Compute the product matrix

Step 2: Compute the Rayleigh quotient matrix

and its k dominant eigenvalues

Step 3: Compute a matrix

Let the eigenvalues of G satisfy (1.1). Then for

where

If at the end of Step 2 the matrix

A comparison of the above orthogonal iteration with the new iteration shows that both methods need about the same amount of computer storage, but the new method doubles the computational effort per iteration. The adaptation of orthogonal iterations to handle other peripheral clusters requires the shift operation. Another difference regards the rate of convergence. In orthogonal iterations the rate is determined by the ratio

The current presentation of the new method is carried out by applying the Krylov information matrix (3.2)-(3.4). This version can be viewed as a “Restarted Krylov method”. The Restarted Lanczos method is a sophisticated implementation of this approach that harnesses the Lanczos algorithm to reduce the computational effort per iteration. As before, the method is aimed at computing a cluster of

The qth iteration,

which have been obtained by applying

Step 1: Compute the eigenvalues of

denote the computed eigenvalues, where the first

Step 2: Compute a

such that

Step 3: The above QR factorization is used to build a new

This pair of matrices has the property that it can be obtained by applying

Step 4: Continue

The IRLM iterations are due to Sorensen [

A different implementation of the Restarted Lanczos idea, the Thick-Restarted Lanczos (TRLan) method, was proposed by Wu and Simon [

Step 1: Compute

which is used to compute the related matrix of Ritz vectors,

Step 2: Let

and

Step 3: The vector

Step 4: Continue

that has mutually orthonormal columns and satisfy

where

with

Step 5: Use a sequence of Givens rotations to complete the reduction of

For detailed description of the above iteration see [

One difference between the new method and the Restarted Lanczos approach lies in the computation of the Rayleigh quotient matrix. In our method this computation requires additional

A second difference lies in the starting vector of the

A third difference arises when using acceleration techniques. Let us consider for example the use of power acceleration with

The new method can be viewed as generalization of the Restarted Lanczos approach. The generalization is carried out by replacing the Lanczos process with standard orthogonalization. This simplifies the algorithm and clarifies the main reasons that lead to fast rate of convergence. One reason is that each iteration builds a new Krylov subspace, using an improved starting vector. A second reason comes from the orthogonality requirement: The new Krylov subspace is orthogonalized against the current Ritz vectors. It is this orthogonalization that ensures successive improvement. (The Restarted Lanczos algorithms achieve these tasks in implicit ways.)

In this section we describe some experiments that illustrate the behavior of the proposed method. The test matrices have the form

where

The term “random orthonormal” means that

Type A matrices, where

Type B matrices, where

Type C matrices, where

Type D matrices, where

and

The difference between the computed Ritz values and the desired eigenvalues of

The figures in Tables 1-4 provide the values of

The new method is implemented as described in Section 3. It starts by orthonormalizing a random Krylov matrix of the form (3.1). The information matrix,

Iter. No. | Type A | Type B | Type C | Type D | ||||
---|---|---|---|---|---|---|---|---|

ℓ = 12 | ℓ = 18 | ℓ = 12 | ℓ = 18 | ℓ = 12 | ℓ = 18 | ℓ = 12 | ℓ = 18 | |

0 | 1.39E1 | 9.91E0 | 6.18E0 | 3.91E0 | 4.58E0 | 3.99E0 | 1.53E0 | 2.41E−1 |

1 | 4.28E0 | 1.96E0 | 1.12E0 | 4.35E−1 | 9.85E−1 | 6.11E−1 | 4.09E−2 | 3.02E−4 |

2 | 1.15E0 | 2.93E−1 | 6.66E−2 | 6.95E−4 | 1.14E−1 | 1.22E−2 | 6.82E−4 | 3.45E−7 |

3 | 4.10E−1 | 1.18E−1 | 8.36E−4 | 2.10E−6 | 3.28E−3 | 8.04E−6 | 1.78E−5 | 1.20E−9 |

4 | 1.69E−1 | 4.72E−3 | 1.76E−5 | 3.23E−8 | 1.36E−5 | 1.46E−7 | 4.71E−7 | 4.29E−12 |

5 | 5.07E−2 | 1.17E−4 | 1.47E−6 | 3.86E−10 | 2.74E−7 | 5.19E−9 | 2.19E−8 | 1.52E−13 |

6 | 1.61E−3 | 4.93E−6 | 3.50E−8 | 4.52E−11 | 2.60E−9 | 7.70E−11 | 3.92E−10 | |

7 | 3.42E−4 | 5.13E−7 | 3.68E−9 | 7.53E−12 | 5.75E−11 | 2.12E−12 | 3.94E−11 | |

8 | 4.82E−5 | 1.10E−8 | 9.13E−11 | 1.25E−12 | 1.67E−11 | 2.82E−13 | 4.72E−12 | |

10 | 2.25E−6 | 3.11E−10 | 6.21E−13 | 3.98E−13 | 9.00E−14 | |||

12 | 1.49E−7 | 1.31E−11 | ||||||

14 | 8.21E−9 | 1.13E−12 |

Iter. No. | Type A | Type B | Type C | Type D | ||||
---|---|---|---|---|---|---|---|---|

ℓ = 18 | ℓ = 30 | ℓ = 18 | ℓ = 30 | ℓ = 18 | ℓ = 30 | ℓ = 18 | ℓ = 30 | |

0 | 7.93E1 | 6.52E1 | 5.79E1 | 5.13E1 | 6.63E1 | 5.80E1 | 3.99E1 | 3.52E1 |

1 | 3.55E1 | 2.95E1 | 1.43E1 | 9.30E0 | 8.20E0 | 3.88E0 | 2.84E1 | 1.16E1 |

2 | 2.18E1 | 1.74E1 | 8.01E0 | 4.18E0 | 5.96E0 | 2.66E0 | 1.39E1 | 1.96E1 |

4 | 1.16E1 | 8.07E0 | 3.26E0 | 1.00E0 | 3.23E0 | 1.07E0 | 6.49E0 | 1.77E0 |

8 | 4.83E0 | 2.41E0 | 7.44E−1 | 8.19E−2 | 8.84E−1 | 1.39E−1 | 1.59E0 | 2.16E−1 |

12 | 2.54E0 | 8.20E−1 | 1.69E−1 | 1.05E−2 | 3.23E−1 | 1.44E−2 | 4.80E−1 | 1.91E−2 |

16 | 1.55E0 | 2.56E−1 | 3.57E−2 | 1.38E−3 | 9.38E−2 | 1.34E−3 | 1.30E−1 | 1.48E−3 |

20 | 9.95E−1 | 7.75E−2 | 7.61E−3 | 1.50E−4 | 2.70E−2 | 1.28E−4 | 3.73E−2 | 1.12E−4 |

24 | 6.43E−1 | 2.33E−2 | 1.69E−3 | 1.41E−5 | 7.92E−3 | 1.29E−5 | 1.11E−2 | 8.75E−6 |

Iter. No. | Type A | Type B | Type C | Type D | ||||
---|---|---|---|---|---|---|---|---|

ℓ = 12 | ℓ = 18 | ℓ = 12 | ℓ = 18 | ℓ = 12 | ℓ = 18 | ℓ = 12 | ℓ = 18 | |

0 | 3.19E1 | 2.94E1 | 1.11E1 | 9.97E0 | 4.99E0 | 4.70E0 | 6.29E0 | 2.55E0 |

1 | 1.30E1 | 7.68E0 | 4.93E0 | 2.83E0 | 7.45E−1 | 3.61E−1 | 1.17E0 | 5.72E−2 |

2 | 7.17E0 | 3.00E0 | 2.64E0 | 6.89E−1 | 4.49E−3 | 4.15E−5 | 7.38E−2 | 2.81E−5 |

3 | 3.13E0 | 8.79E−1 | 1.45E0 | 1.80E−1 | 4.52E−5 | 2.31E−7 | 9.17E−4 | 2.32E−8 |

4 | 1.37E0 | 1.28E−1 | 1.14E0 | 5.63E−2 | 4.03E−6 | 8.06E−9 | 1.69E−5 | 2.13E−11 |

5 | 6.58E−1 | 1.67E−2 | 4.80E−1 | 1.87E−3 | 1.80E−8 | 3.60E−10 | 3.68E−7 | 1.67E−13 |

6 | 1.05E−1 | 2.36E−4 | 6.56E−2 | 2.87E−4 | 9.35E−10 | 5.66E−11 | 8.31E−9 | 5.33E−14 |

7 | 1.30E−2 | 5.73E−6 | 2.69E−3 | 9.09E−5 | 3.65E−11 | 2.74E−12 | 1.91E−10 | |

8 | 7.44E−4 | 5.95E−7 | 9.73E−4 | 2.06E−5 | 3.14E−12 | 7.61E−13 | 4.54E−12 | |

10 | 3.67E−5 | 4.74E−10 | 2.14E−4 | 2.98E−6 | 2.75E−13 | 8.41E−14 | ||

12 | 6.94E−7 | 9.70E−12 | 5.66E−5 | 5.84E−7 | ||||

14 | 2.09E−8 | 2.47E−13 | 1.82E−5 | 2.72E−7 |

Iter. No. | ℓ = 6 | ℓ = 12 | ℓ = 18 | ||||
---|---|---|---|---|---|---|---|

m = 2 | m = 4 | m = 2 | m = 3 | m = 4 | m = 2 | m = 3 | |

0 | 1.68E1 | 7.82E0 | 6.41E0 | 3.80E0 | 2.43E0 | 4.34E0 | 2.39E0 |

1 | 8.52E0 | 2.46E0 | 1.09E0 | 5.05E−1 | 1.84E−1 | 3.37E−1 | 1.62E−1 |

2 | 3.84E0 | 8.78E−1 | 1.94E−1 | 1.16E−1 | 1.38E−3 | 1.64E−2 | 1.30E−4 |

3 | 1.99E0 | 2.21E−1 | 3.14E−2 | 3.48E−3 | 6.72E−7 | 7.73E−5 | 2.11E−8 |

4 | 9.03E−1 | 1.55E−1 | 2.78E−4 | 5.04E−5 | 1.86E−8 | 7.46E−7 | 2.46E−10 |

5 | 3.51E−1 | 3.48E−2 | 8.74E−6 | 4.82E−7 | 1.49E−10 | 1.66E−8 | 2.96E−11 |

6 | 1.79E−1 | 1.83E−3 | 1.24E−6 | 3.41E−8 | 2.66E−11 | 1.71E−9 | 3.36E−13 |

7 | 1.59E−1 | 1.46E−4 | 2.33E−8 | 1.78E−10 | 3.21E−12 | 5.29E−11 | |

8 | 1.09E−1 | 1.34E−5 | 3.25E−9 | 3.24E−11 | 3.13E−13 | 1.61E−11 | |

10 | 1.33E−2 | 8.13E−7 | 9.34E−11 | 4.64E−13 | 3.93E−13 | ||

12 | 2.28E−3 | 7.93E−8 | 2.42E−12 | ||||

14 | 2.76E−4 | 1.47E−9 | 2.61E−13 | ||||

18 | 8.14E−6 | 2.32E−11 |

Another observation stems from the first rows of these tables: We see that a random Krylov matrix gives a better start than a random starting matrix.

The ability of the new method to compute a left-side cluster is illustrated in

The merits of Power acceleration are demonstrated in

The new method is based on a modified interlacing theorem which forces the Rayleigh-Ritz approximations to move monotonically toward their limits. The current presentation concentrates on the Krylov information matrix (3.2)-(3.4), but the method can use other information matrices. The experiments that we have done are quite encouraging, especially when calculating peripheral clusters. The theory suggests that the method can be extended to calculate certain non-peripheral clusters, but in this case we face some difficulties due to rounding errors. Further modifications of the new method are considered in [

AchiyaDax, (2015) A Subspace Iteration for Calculating a Cluster of Exterior Eigenvalues. Advances in Linear Algebra & Matrix Theory,05,76-89. doi: 10.4236/alamt.2015.53008