^{1}

^{*}

^{2}

In this paper, we propose an improved preconditioned algorithm for the conjugate gradient squared method (improved PCGS) for the solution of linear equations. Further, the logical structures underlying the formation of this preconditioned algorithm are demonstrated via a number of theorems. This improved PCGS algorithm retains some mathematical properties that are associated with the CGS derivation from the bi-conjugate gradient method under a non-preconditioned system. A series of numerical comparisons with the conventional PCGS illustrate the enhanced effectiveness of our improved scheme with a variety of preconditioners. This logical structure underlying the formation of the improved PCGS brings a spillover effect from various bi-Lanczos-type algorithms with minimal residual operations, because these algorithms were constructed by adopting the idea behind the derivation of CGS. These bi-Lanczos-type algorithms are very important because they are often adopted to solve the systems of linear equations that arise from large-scale numerical simulations.

In scientific and technical computation, natural phenomena or engineering problems are described through numerical models. These models are often reduced to a system of linear equations:

where A is a large, sparse coefficient matrix of size

The conjugate gradient squared (CGS) method is a way to solve (1) [

Bi-Lanczos-type algorithms are derived from the bi-conjugate gradient (BiCG) method [

Characteristically, the coefficient matrix of (2) is the transpose of A. In this paper, we term (2) a “shadow system”.

Bi-Lanczos-type algorithms have the advantage of requiring less memory than Arnoldi-type algorithms, another class of Krylov subspace methods.

The CGS method is derived from BiCG. Furthermore, various bi-Lanczos-type algorithms, such as BiCGStab [

Many iterative methods, including bi-Lanczos algorithms, are often applied together with some preconditioning operation. Such algorithms are called preconditioned algorithms; for example, preconditioned CGS (PCGS). The application of preconditioning operations to iterative methods effectively enhances their performance. Indeed, the effects attributable to different preconditioning operations are greater than those produced by different iterative methods [

Consequently, PCGS holds an important position within the Krylov subspace methods. In this paper, we identify a mathematical issue with the conventional PCGS algorithm, and propose an improved PCGS1. This improved PCGS algorithm is derived rationally in accordance with its logical structure.

In this paper, preconditioned algorithm and preconditioned system refer to solving algorithms described with some preconditioning operator M (or preconditioner, preconditioning matrix) and the system converted by the operator based on M, respectively. These terms never indicate the algorithm for the preconditioning operation itself, such as “incomplete LU decomposition”, “approximate inverse”, and so on. For example, for a preconditioned system, the original linear system (1) becomes

under the preconditioner

This paper is organized as follows. Section 2 provides an overview of the derivation of the CGS method and the properties of two scalar coefficients (

In this section, we derive the CGS method from the BiCG method, and introduce the preconditioned BiCG algorithm.

BiCG [

Algorithm 1. BiCG method:

For

End Do

BiCG implements the following Theorems.

Theorem 1 (Hestenes et al. [

where

Using the polynomials of Theorem 1, the residual vector for the linear system (1) and the shadow residual vector for the shadow system (2) can be written as

These probing direction vectors are represented by

where

Theorem 2 (Fletcher [

The CGS method is derived by transforming the scalar coefficients in the BiCG method to avoid the

and the following theorem can be applied.

Theorem 3 (Sonneveld [

Proof. We apply

to (16) and (17). Then,

□

The CGS method is derived from BiCG by Theorem 4.

^{3}In this paper, if we specifically distinguish this algorithm, we write

^{4}In this paper, we use the superscript “CGS” alongside

Theorem 4 (Sonneveld [

the solution vector

Proof. The coefficients

Further, we can apply

Thus, we have derived the CGS method4.

Algorithm 2. CGS method:

For

End Do

The following Proposition 5 and Corollary 1 are given as a supplementary explanation for Algorithm 2. These are almost trivial, but are comparatively important in the next section’s discussion.

Proposition 5 There exist the following relations:

where

Proof. Equation (23) follows because (18) for

Equation (24) is derived as follows. Applying (18) to (16), we obtain

This equation shows that the inner product of the CGS on the right is obtained from the inner product of the BiCG on the left. Therefore,

Hereafter,

Corollary 1. There exists the following relation:

where

In this subsection, the preconditioned BiCG algorithm is derived from the non-preconditioned BiCG method (Algorithm 1). First, some basic aspects of the BiCG method under a preconditioned system are expressed, and a standard preconditioned BiCG algorithm is given.

When the BiCG method (Algorithm 1) is applied to linear equations under a preconditioned system:

we obtain a “BiCG method under a preconditioned system” (Algorithm 3). We denote this as “PBiCG”.

In this paper, matrices and vectors under the preconditioned system are denoted with “

Algorithm 3. BiCG method under the preconditioned system:

For

End Do

^{5}If we wish to emphasize different methods, a superscript is applied to the relevant vectors to denote the method, such as

^{6}We represent the polynomials R and P in italic font to denote a preconditioned system.

We now state Theorem 6 and Theorem 7, which are clearly derived from Theorem 1 and Theorem 2, respectively.

Theorem 6. Under the preconditioned system, there are recurrence relations that define the degree k of the residual polynomial

where

Using the polynomials of Theorem 6, the residual vectors of the preconditioned linear system (25) and the shadow residual vectors of the following preconditioned shadow system:

can be represented as

respectively. The probing direction vectors are given by

respectively. Under the preconditioned system,

Remark 1. The shadow systems given by (31) do not exist, but it is very important to construct systems in which the transpose of the matrix

Theorem 7. The BiCG method under the preconditioned system satisfies the following conditions:

Next, we derive the standard PBiCG algorithm. Here, the preconditioned linear system (25) and its shadow system (31) are formed as follows:

Definition 1. On the subject of the PBiCG algorithm, the solution vector is denoted as

Using this notation, each vector of the BiCG under the preconditioned system given by Algorithm 3 is converted as below:

Substituting the elements of (40) into (36) and (37), we have

Consequently, (26) and (27) become

Before the iterative step, we give the following Definition 2.

Definition 2. For some preconditioned algorithms, the initial residual vector of the linear system is written as

We adopt the following preconditioning conversion after (40).

Consequently, we can derive the following standard PBiCG algorithm [

Algorithm 4. Standard preconditioned BiCG algorithm:

For

End Do

Algorithm 4 satisfies

Remark 2. Because we apply a preconditioning conversion such as

In this section, we have shown that

In this section, we first explain the derivation of PCGS, and present the conventional PCGS algorithm. We identify an issue with this conventional PCGS algorithm, and propose an improved PCGS that overcomes this issue.

Typically, PCGS algorithms are derived via a “CGS method under a preconditioned system” (Algorithm 5).

Algorithm 5 is derived by applying the CGS method (Algorithm 2) to the preconditioned linear system (25). In this section, the vectors and

Algorithm 5. CGS method under the preconditioned system:

For

End Do

The conventional PCGS algorithm (Algorithm 6) is derived via the CGS method, as shown in

The conventional PCGS algorithm is adopted in many documents and numerical libraries [

This gives the following Algorithm 6 (“Conventional PCGS” in

Algorithm 6. Conventional PCGS algorithm:

For

End Do

This PCGS algorithm was described in [

This version of PCGS seems to be a compliant algorithm on the surface, because the operation

This is different to the conversion given by (40), and we cannot obtain equivalent coefficients to

In this subsection, we present an improved PCGS algorithm (“Improved PCGS” in

The polynomials (32)-(35) of the residual vectors and the probing direction vectors in PBiCG are substituted for the numerators and denominators of

and apply the following Theorem 8.

Theorem 8. The PCGS coefficients

Proof. We apply

to (54) and (55). Then,

□

The PCGS method is derived from PBiCG using Theorem 9.

Theorem 9. The PCGS method is derived from the linear system’s recurrence relations in the PBiCG method under the property of equivalence between the coefficients

Proof. The coefficients

Further, we can apply

The following Proposition 10 and Corollary 2 are given as a supplementary explanation under the preconditioned system.

Proposition 10. There exist the following relations:

where

Proof. Equation (61) follows because (56) for

Equation (62) is derived as follows. Applying (56) to (54), we obtain

This equation shows that the inner product of the PCGS on the right is obtained from the inner product of the PBiCG on the left. Therefore,

Hereafter,

Corollary 2. There exists the following relation:

where

The CGS preconditioning conversion given by

^{7}We apply the superscript “PCGS” to

As a consequence, the following improved PCGS algorithm is derived7.

Algorithm 7. Improved preconditioned CGS algorithm:

For

End Do

Algorithm 7 can also be derived by applying the following preconditioning conversion to Algorithm 5. Here, we treat the preconditioning conversions of

The number of preconditioning operations in the iterative part of Algorithm 7 is the same as that in Algorithm 6.

In this section, we compare the conventional and improved PCGS algorithms numerically.

The test problems were generated by building real unsymmetric matrices corresponding to linear systems taken from the Tim Davis collection [

The numerical experiments were executed on a DELL Precision T7400 (Intel Xeon E5420, 2.5 GHz CPU, 16 GB RAM) running the Cent OS (kernel 2.6.18) and the Intel icc 10.1 compiler.

The results using the non-preconditioned CGS are listed in

The results given by the conventional PCGS and the improved PCGS are listed in Tables 2-5. Each table adopts a different preconditioner in Lis [^{8}, “SAINV”, and “Crout ILU”. In these tables, significant advantages of one algorithm over the other are emphasized by bold font9. Additionally, matrix names given in italic font in

In many cases, the results given by the improved PCGS are better than those from the conventional algorithm. We should pay particular attention to the results from matrices “mcca”, “mcfe” and “watt_1”. In these cases, it appears that the conventional PCGS converges faster with any preconditioner, but the TRE values are worse than those from the improved algorithm. The iteration number for the conventional PCGS is not emphasized by bold font in these instances. The consequences of this anomaly are worth investigating further, possibly by analyzing them under PBiCG. This will be the subject of future work.

Matrix | N | NNZ | CGS (Algorithm 2) | |||
---|---|---|---|---|---|---|

Iter. | TRR | TRE | Time | |||

arc130 | 130 | 1037 | 11 | −12.20 | −7.05 | 1.18e−4 |

bfwa782 | 782 | 7514 | 320 | −11.29 | −11.94 | 1.71e−2 |

cryg2500 | 2500 | 12349 | No convergence | |||

epb1 | 14734 | 95053 | 770 | −7.45 | −6.50 | 5.93e−1 |

jpwh_991 | 991 | 6027 | Breakdown | |||

mcca | 180 | 2659 | No convergence | |||

mcfe | 765 | 24382 | No convergence | |||

memplus | 17758 | 99147 | 1334 | −9.16 | −6.76 | 1.33e+0 |

olm1000 | 1000 | 3996 | No convergence | |||

olm5000 | 5000 | 19996 | No convergence | |||

pde900 | 900 | 4380 | 113 | −9.87 | −10.49 | 4.13e−3 |

pde2961 | 2961 | 14585 | 256 | −9.49 | −10.19 | 3.06e−2 |

sherman2 | 1080 | 23094 | No convergence | |||

sherman3 | 5005 | 20033 | No convergence | |||

sherman5 | 3312 | 20793 | 1927 | −10.36 | −9.69 | 3.34e−1 |

viscoplastic2 | 32769 | 381326 | 801 | −10.34 | −8.18 | 2.20e+0 |

watt_1 | 1856 | 11360 | 306 | −12.10 | −6.07 | 2.72e−2 |

In this table, “N” is the problem size and “NNZ” is the number of nonzero elements. The items in each column are, from left to right, the number of iterations required to converge (denoted “Iter.”), the true relative residual log_{10} 2-norm (denoted by “TRR”, calculated as_{10} 2-norm (denoted by “TRE”, calculated from the numerical solution and the exact solution, that is,

In this paper, we have developed an improved PCGS algorithm by applying the procedure for deriving CGS to the BiCG method under a preconditioned system, and we also have presented some mathematical theorems underlying the deriving process’s logic. The improved PCGS does not increase the number of preconditioning operations in the iterative part of the algorithm. Our numerical results established that solutions obtained with the proposed algorithm are superior to those from the conventional algorithm for a variety of preconditioners.

However, the improved algorithm may still break down during the iterative procedure. This is an artefact of certain characteristics of the non-preconditioned BiCG and CGS methods, mainly the operations based on the bi-orthogonality and bi-conjugacy conditions. Nevertheless, this improved logic can be applied to other bi- Lanczos-based algorithms with minimal residual operations.

In future work, we will analyze the mechanism of the conventional and improved PCGS algorithms, and consider other variations of this algorithm. Furthermore, we will consider other settings of the initial shadow residual vector

Matrix | Conv PCGS (Algorithm 6) | Impr PCGS (Algorithm 7) | ||||||
---|---|---|---|---|---|---|---|---|

Iter. | TRR | TRE | Time | Iter. | TRR | TRE | Time | |

arc130 | 5 | −15.91 | −10.61 | 6.88e−5 | 5 | −15.87 | −10.66 | 7.28e−5 |

bfwa782 | 227 | −11.50 | −12.30 | 1.25e−2 | 260 | −11.98 | −12.02 | 1.41e−2 |

cryg2500 | No convergence | No convergence | ||||||

epb1 | 578 | −7.66 | −6.44 | 4.56e−1 | 591 | −7.59 | −6.86 | 4.68e−1 |

jpwh_991 | Breakdown | Breakdown | ||||||

mcca | 84 | −6.53 | −0.93 | 1.44e−3 | 120 | −8.93 | −12.61 | 2.06e−3 |

mcfe | 764 | −3.56 | 2.97 | 7.95e−2 | 908 | −6.26 | −8.79 | 9.54e−2 |

memplus | 213 | −12.10 | −9.06 | 2.19e−1 | 230 | −12.41 | −9.38 | 2.37e−1 |

olm1000 | No convergence | No convergence | ||||||

olm5000 | No convergence | No convergence | ||||||

pde900 | 113 | −7.44 | −7.63 | 4.30e−3 | 100 | −11.22 | −11.75 | 3.81e−3 |

pde2961 | 206 | −8.27 | −8.68 | 2.55e−2 | 237 | −6.44 | −6.61 | 2.92e−2 |

sherman2 | No convergence | No convergence | ||||||

sherman3 | 1012 | −7.30 | −8.40 | 2.15e−1 | 827 | −6.74 | −7.59 | 1.75e−1 |

sherman5 | 131 | −12.29 | −12.63 | 2.35e−2 | 128 | −12.40 | −12.03 | 2.28e−2 |

viscoplastic2 | 660 | −9.99 | −8.00 | 1.97e+0 | 645 | −12.45 | −10.14 | 1.88e+0 |

watt_1 | 80 | −12.51 | −5.75 | 7.53e−3 | 79 | −12.61 | −5.44 | 7.53e−3 |

Matrix | Conv PCGS (Algorithm 6) | Impr PCGS (Algorithm 7) | ||||||
---|---|---|---|---|---|---|---|---|

Iter. | TRR | TRE | Time | Iter. | TRR | TRE | Time | |

arc130 | 2 | −15.90 | −6.35 | 2.94e−4 | 3 | −16.26 | −11.07 | 2.98e−4 |

bfwa782 | 93 | −9.36 | −10.29 | 1.18e−2 | 78 | −12.82 | −12.48 | 1.01e−2 |

cryg2500 | No convergence | 385 | −8.47 | −4.22 | 8.49e−2 | |||

epb1 | 124 | −11.38 | −10.34 | 2.14e−1 | 129 | −9.23 | −8.54 | 2.29e−1 |

jpwh_991 | Breakdown | 16 | −12.44 | −12.53 | 2.72e−3 | |||

mcca | 7 | −10.53 | −11.10 | 5.99e−4 | 7 | −9.98 | −11.70 | 6.18e−4 |

mcfe | 10 | −12.83 | −11.58 | 5.70e−3 | 9 | −12.15 | −10.61 | 5.52e−3 |

memplus | 303 | −12.13 | −10.36 | 7.22e−1 | 305 | −12.12 | −10.61 | 7.18e−1 |

olm1000 | No convergence | 34 | −12.49 | −9.19 | 2.85e−3 | |||

olm5000 | No convergence | 34 | −12.20 | −8.05 | 1.41e−2 | |||

pde900 | 27 | −13.61 | −14.27 | 2.57e−3 | 27 | −13.19 | −13.92 | 2.59e−3 |

pde2961 | 53 | −10.57 | −11.34 | 1.49e−2 | 58 | −11.78 | −12.65 | 1.62e−2 |

sherman2 | 12 | −13.60 | −11.46 | 6.08e−3 | 11 | −14.20 | −11.55 | 5.91e−3 |

sherman3 | 103 | −9.82 | −11.57 | 4.39e−2 | 96 | −10.82 | −13.34 | 4.10e−2 |

sherman5 | 31 | −13.68 | −12.89 | 1.36e−2 | 30 | −12.54 | −12.42 | 1.31e−2 |

viscoplastic2 | 812 | −7.55 | −4.68 | 7.01e+0 | 844 | −11.80 | −8.69 | 7.18e+0 |

watt_1 | 27 | −13.01 | −5.96 | 6.17e−3 | 35 | −12.11 | −9.77 | 7.75e−3 |

Matrix | Conv PCGS (Algorithm 6) | Impr PCGS (Algorithm 7) | ||||||
---|---|---|---|---|---|---|---|---|

Iter. | TRR | TRE | Time | Iter. | TRR | TRE | Time | |

arc130 | 4 | −17.54 | −8.75 | 3.22e−4 | 5 | −19.27 | −11.20 | 3.24e−4 |

bfwa782 | 109 | −9.49 | −9.75 | 1.72e−2 | 106 | −12.34 | −12.07 | 1.73e−2 |

cryg2500 | No convergence | No convergence | ||||||

epb1 | 69 | −12.40 | −11.91 | 2.90e+0 | 69 | −12.31 | −12.41 | 2.89e+0 |

jpwh_991 | Breakdown | 41 | −12.20 | −13.06 | 7.37e−3 | |||

mcca | 81 | −5.32 | 0.00 | 2.26e−3 | 111 | −8.60 | −14.26 | 3.04e−3 |

mcfe | 764 | −3.56 | 2.97 | 1.10e−1 | 908 | −6.26 | −8.79 | 1.29e−1 |

memplus | 34 | −12.30 | −10.20 | 1.18e+0 | 34 | −12.32 | −10.24 | 1.20e+0 |

olm1000 | No convergence | No convergence | ||||||

olm5000 | No convergence | No convergence | ||||||

pde900 | 53 | −10.37 | −11.16 | 7.25e−3 | 51 | −12.43 | −13.03 | 7.28e−3 |

pde2961 | 110 | −11.62 | −12.02 | 6.91e−2 | 96 | −12.08 | −12.55 | 6.66e−2 |

sherman2 | No convergence | No convergence | ||||||

sherman3 | 874 | −7.49 | −8.34 | 5.53e−1 | 629 | −9.49 | −10.75 | 4.24e−1 |

sherman5 | 157 | −12.02 | −11.19 | 9.20e−2 | 127 | −12.57 | −12.30 | 8.09e−2 |

viscoplastic2 | No convergence | No convergence | ||||||

watt_1 | 1 | −12.37 | −4.05 | 1.49e+0 | 2 | −14.81 | −11.31 | 1.53e+0 |

Matrix | Conv PCGS (Algorithm 6) | Impr PCGS (Algorithm 7) | ||||||
---|---|---|---|---|---|---|---|---|

Iter. | TRR | TRE | Time | Iter. | TRR | TRE | Time | |

arc130 | 2 | −15.90 | −6.35 | 2.94e−4 | 3 | −16.26 | −11.07 | 2.98e−4 |

bfwa782 | 93 | −9.36 | −10.29 | 1.18e−2 | 78 | −12.82 | −12.48 | 1.01e−2 |

cryg2500 | No convergence | 385 | −8.47 | −4.22 | 8.49e−2 | |||

epb1 | 124 | −11.38 | −10.34 | 2.14e−1 | 129 | −9.23 | −8.54 | 2.29e−1 |

jpwh_991 | Breakdown | 16 | −12.44 | −12.53 | 2.72e−3 | |||

mcca | 7 | −10.53 | −11.10 | 5.99e−4 | 7 | −9.98 | −11.70 | 6.18e−4 |

mcfe | 10 | −12.83 | −11.58 | 5.70e−3 | 9 | −12.15 | −10.61 | 5.52e−3 |

memplus | 303 | −12.13 | −10.36 | 7.22e−1 | 305 | −12.12 | −10.61 | 7.18e−1 |

olm1000 | No convergence | 34 | −12.49 | −9.19 | 2.85e−3 | |||

olm5000 | No convergence | 34 | −12.20 | −8.05 | 1.41e−2 | |||

pde900 | 27 | −13.61 | −14.27 | 2.57e−3 | 27 | −13.19 | −13.92 | 2.59e−3 |

pde2961 | 53 | −10.57 | −11.34 | 1.49e−2 | 58 | −11.78 | −12.65 | 1.62e−2 |

sherman2 | 12 | −13.60 | −11.46 | 6.08e−3 | 11 | −14.20 | −11.55 | 5.91e−3 |

sherman3 | 103 | −9.82 | −11.57 | 4.39e−2 | 96 | −10.82 | −13.34 | 4.10e−2 |

sherman5 | 31 | −13.68 | −12.89 | 1.36e−2 | 30 | −12.54 | −12.42 | 1.31e−2 |

viscoplastic2 | 812 | −7.55 | −4.68 | 7.01e+0 | 844 | −11.80 | −8.69 | 7.18e+0 |

watt_1 | 27 | −13.01 | −5.96 | 6.17e−3 | 35 | −12.11 | −9.77 | 7.75e−3 |

This work is partially supported by a Grant-in-Aid for Scientific Research (C) No. 25390145 from MEXT, Japan.

ShojiItoh,MasaakiSugihara, (2015) Formulation of a Preconditioned Algorithm for the Conjugate Gradient Squared Method in Accordance with Its Logical Structure. Applied Mathematics,06,1389-1406. doi: 10.4236/am.2015.68131