A Count Sketch Maximal Weighted Residual Kaczmarz Method with Oblique Projection for Highly Overdetermined Linear Systems ()
1. Introduction
We consider the following consistent linear system:
(1.1)
where
,
and x is the n-dimensional unknown vector. One of the most popular solvers for consistent linear systems (1.1) is Kaczmarz method, which was first discovered by Stefen Kaczmarz. In 2009, Strohmer and Vershynin [1] proposed the randomized Kaczmarz method with the expected exponential rate of convergence, which has triggered many scholars to research on the Kaczmarz algorithm. See [2] [3] [4]. Due to its simplicity and performance, the Kaczmarz method has many applications ranging from image reconstruction [5], distributed computing [6] to signal process [7].
Since the classical Kaczmarz method cycles through all rows of coefficient matrix A, the convergence rate depends strongly on the row index selection strategy. Mccormick [8] proposed a Maximal Weighted Residual Kaczmarz (MWRK) method, which selects the component of residual with the largest module length at each iteration. Inspired by the proof of the Greedy Randomized Kaczmarz (GRK) method [9] with remarkable convergence, Du and Gao [10] gave a new theoretical estimate for the convergence rate of the MWRK method, dependent on quantities of the coefficient matrix. Another interesting direction of studying Kaczmarz is to combine it with random sketching matrices. In the past decades, many random sketching matrices were found, such as Gaussian random projection [11], the Subsampled Randomized Hadmard Transform [12] and the count sketch [13] [14]. Zhang and Li [15] proposed a Count Sketch Maximal Weighted Residual Kaczmarz (CS-MWRK) method to solve highly overdetermined linear systems. The core of it is that the count sketch matrix can reduce the computation cost with keeping most of the information original problem [12] [16]. Experiments in [15] show that it can speed up the CPU time for solving highly overdetermined linear systems. For more sketch Kaczmarz-type methods, we refer the reader to [17] [18] [19] and the references therein.
Recently, Li, Wang, Bao and Liu [20] proposed a new Kaczmarz method with a new descent direction based on the oblique projection introduced by Constantin Popa in [21] [22], for short as KO. Using the row index selection rule in the MWRK and GRK methods, Wang, Li, Bao and Liu [23] gave two accelerated variants of the KO method: Maximal Weighted Residual Kaczmarz Method with oblique projection (MWRKO) and greedy randomized Kaczmarz method with oblique projection (GRKO). Inspired by the work of Zhang and Li [15], we combine the count sketch tech with the MWRKO method to develop a Count Sketch Maximal Weighted Residual Kaczmarz Method with oblique projection (CS-MWRKO) and obtain the convergence rate of it. Numerical experiments demonstrate that the CS-MWRKO method requires less computing time for highly overdetermined linear systems, especially for near-linear correction structure systems, compared with the CS-MWRK and MWRKO methods.
The organization of the paper is as follows. In Section 2, we propose the CS-MWRKO method and its convergence is analyzed. Section 3 contains experimental results demonstrating the efficiency of the presented method. We end this paper with some conclusions in Section 4.
We end this section with some notation. In this paper,
stands for the scalar product.
is the Euclid norm of
. For a given matrix
,
,
,
,
,
,
,
and
are used to denote the ith row, the transpose, the Moore-Penrose pseudoinverse, the range space, the null space, the Frobenius norm, ith singular value and smallest nonzero singular value, respectively. We let
to denote the kth residual vector and
represents ikth entry of
.
is any solution of the system (1.1).
2. The Count Sketch Maximal Weighted Residual Kaczmarz Method with Oblique Projection
In this section, we combine the MWRKO method with the CS-MWRK method to construct a new method for (1.1), for short as CS-MWRKO, listed in Algorithm 1.
Next, we introduce some lemmas used to analyze the convergence of our method.
Lemma 2.1. ( [16], Theorem 1) If
is a count sketch transform with
, where
, then we have that:
for all
, and:
for all
, hold with probability
.
Lemma 2.2. ( [24], Lemma 1) For any vector
, it holds that:
Lemma 2.3. Let S be given as in Lemma 2.1. Then
is equal to
with probability
.
Proof. It can be found in the proof of ( [15], Theorem 3), we omit it here.
Lemma 2.4. The iteration sequence
generated by the CS-MWRKO method satisfies the following equation:
(2.1)
and the residual satisfies:
(2.2)
(2.3)
where
is an arbitrary solution of the system (1.1). Especially, if
then
.
Proof. Since the CS-MWRKO method is equal to the MWRKO method for sketch system
, the Equation (2.1), the Equations (2.2) and (2.3) are easily obtained by ( [23], Lemma 2) and ( [23], Lemma 1), respectively.
For the convergence property of the CS-MWRKO method, we establish the following theorem.
Theorem 2.5. Let
be an arbitrary approximation and
is a solution of (1.1) such that
. Let S be given as in Lemma 2.1. Then the sequence
, generated by Algorithm CS-MWRKO, with probability
, obeys:
and for
:
where
and
,
, with
,
and
.
Proof. Based on Lemma 2.3, we can drive the convergence rate of the CS-MWRKO method following from ( [23], Theroem 2) and ( [15], Theroem 3]. For
, by Equation (2.1) in Lemma 2.4, we have:
with probability
. The second inequality comes from Lemma 2.2 and the last inequality with probability
follows from Lemma 2.1. For
, it holds that:
(2.4)
with probability
. Here, in the first inequality, we focus on the Equation (2.2) in Lemma 2.4. The third inequality follows from the Lemma 2.2 and the last inequality holds with probability
by Lemma 2.1. Along the similar lines as in (2.4), we obtain:
with probability
. Thus, we complete the proof.
Remark 2.6. Set
,
and the convergence of the CS-MWRK method in [15] is:
Since
,
and
, the CS-MWRKO method is faster than the CS-MWRK method. Based on the ( [15], Remark 4), the convergence rate of CS-MWRKO is indeed larger than that of the MWRKO method. This is why the iteration numbers of the former is worse than that of the latter in numerical examples.
3. Numerical Examples and Results
Since the MWRKO [23] method is more effective than the GRK [9], GRKO [23] and MWRK [10] methods, in this section, we give some examples to illustrate the effectiveness of the CS-MWRKO method compared with the MWRKO and CS-MWRK [15] methods in terms of the iteration numbers (denoted as “IT”) and computing time in seconds (denoted as “CPU time”) for (1.1). We also report the iteration numbers speedup of the CS-MWRKO method against the MWRKO and CS-MWRK methods defined by:
and the CPU time speedup of the CS-MWRKO method against the MWRKO and CS-MWRK methods defined by:
For the coefficient matrix A, we use the following two choices: the random matrices generated by MATLAB function rand and the other selected from the University of Florida sparse matrix collection [25]. In the following experiments, the right-hand vector
such that the exact solution
is a vector generated by the MATLAB function rand. We repeat 50 experiments and all the experiments start from an initial vector
, and terminate once the Relative Solution Error (RES) defined by:
satisfies RES < 0.5−10 or the number of the iteration steps exceeds 100,000. All experiments presented in this section are performed in MATLAB R2018b on a personal computer with 2.00 GHz central processing unit (Intel(R) Core(TM) i5 CPU), 16.00 GB memory, and Windows operating system (Windows 10).
Example One. In this example, we report iteration numbers and CPU time for the CS-MWRKO, MWRKO and CS-MWRK methods for the randomly generated matrices in
, listed in Table 1. From this table, we show that the CS-MWRKO performs better than the MWRKO and CS-MWRK methods in CPU time. The CPU speedup1 is at least 7.42 and at most 20.68 and the CPU speedup2 is at least 0.95 and at most 1.15 in our experiments. For the iteration numbers, the CS-MWRKO method needs more iterations than the MWRKO method but less than the CS-MWRKO method.
Table 1. Numerical results for the CS-MWRKO, MWRKO, CS-MWRK methods with matrices generated by rand in [0, 1].
Example Two. In this example, we construct a random coefficient matrix with correlated rows
in
, c from 0.1 to 0.9, to test the validity of the CS-MWRKO method with different size of count-sketch matrix S. This set of matrices was also done in [26] and [27]. From Table 2, we note that the CPU speedup1 is at least 6.14 and at most 12.89 and the CPU speedup2 is at least 1.08 and at most 1.61. This is, the CS-MWRKO method outperforms the MWRKO and CS-MWRK methods in term of computing time. For the iteration numbers,
Table 2. Numerical results for the CS-MWRKO, MWRKO, CS-MWRK methods with matrices generated by rand in [c, 1].
Table 3. Numerical results for the CS-MWRKO, MWRKO, CS-MWRK methods with sparse matrices.
we find that iteration numbers of the count sketch MWRK-type methods (CS-MWRK, CS-MWRKO) increase with c growing 0.1 to 0.9 and decrease with the increase of d.
Example Three. In this example, we test CS-MWRKO, MWRKO and CS-MWRK with coefficient matrices from real world data [25]. The two matrices are shar_te2-b1 with 34,320 nonzero elements and ch6-6-b1 with 900 nonzero elements. From Table 3, we see again that the CS-MWRKO method outperforms the CS-MWRK and MWRKO method in CPU time. The minimum of the CPU speedup1 is 2.00 and the maximum can reach 13.48. The minimum of the CPU speedup2 is 1.11 and the maximum is 1.24. For the iteration numbers, we get the same conclusion reported in Example One.
4. Conclusion
In this paper, we construct the count sketch maximal weighted residual Kaczmarz method with oblique projection for highly overdetermined linear systems. Numerical examples validate that our method needs less computing time compared with the MWRKO and CS-MWRK methods, especially for the system (1.1) with a linear correction structure. As we all know, there are many works about block versions of Kaczmarz-type methods [28] [29] [30] [31] [32]. We will consider the organic combination of block tech and oblique tech in future work. This topic is practically valuable and theoretically meaningful.
Acknowledgements
The authors are grateful to the anonymous referees and the Editor for their detailed and helpful comments that led to a substantial improvement to the paper. And also would like to thank Prof. Hanyu Li and Dr. Yanjun Zhang for providing Matlab codes of [15].
Funding
Longyan Li is supported by the Research and Training Program for College Students (No. A2020-171).