1. Introduction
We focus on the linear least squares problem
(1)
where
is of full column rank,
and
denotes the Euclidean norm. There is a wide range of applications for the least squares problem in many fields, such as signal processing, image restoration, and so on. As we know, the coordinate descent method is an effective iteration method for (1). It applies the Gauss-Seidel method to the following equivalent normal Equation
(2)
leading to the following formula
where
denotes the jkth column of
,
is the jkth unit coordinate vector and the superscript T denotes the transpose. The convergence of the Gauss-Seidel method highly depends on the order of the column selected in each step.
Inspired by the fancy work of Strohmer and Vershynin [1] , Leventhal and Lewis [2] proposed the randomized Gauss-Seidel (RGS) method and proved that it has an expected linear convergence rate. As Bai and Wu [3] pointed out an obvious flaw of the RGS method that the probability for selecting column will be a uniform column sampling if the coefficient matrix is scaled with a diagonal matrix. To tackle this problem, they proposed the greedy randomized coordinate descent (GRCD) or called the greedy randomized Gauss-Seidel (GRGS) method by adopting an effective probability for selecting the working column to capture larger entries of the residual vector with respect to (2). They showed that the GRGS method is significantly superior to the RGS method in terms of both theoretical analysis and numerical experiments. In addition, there are tons of attention about the Gauss-Seidel type methods [4] [5] [6] [7] . However, the convergence rate of the RGS method will be significantly reduced when the coefficient matrix columns are highly correlated. To improve the convergence rate, Wang et al. [8] proposed the randomized Gauss-Seidel method with oblique direction (RGSO) by combining two successive selected unit coordinate directions as the search direction
(3)
They showed that, in terms of both theory and experiments, the RGSO method outperforms the RGS method. For more discussions about the oblique projection, we refer the readers to other literatures [9] [10] and their references.
However, the RGSO method still exists in the same flaw of the RGS method that the probability for selecting column will be a uniform column sampling if the coefficient matrix is scaled with a diagonal matrix. In addition, it is worth noting that the convergence rate of the GRGS method would significantly decrease when the coefficient matrix columns are close to linear correlation. For the above mentioned limitations, we present a greedy randomized Gauss-Seidel method with oblique direction (GRGSO) for solving (1), by combining the oblique direction and the GRGS method. In theory, it is proved that the iterative solutions generated by the GRGSO method can converge to the least squares solution
when the coefficient matrix is of full column rank. Numerical results show that compared with the RGSO and GRGS methods, the GRGSO method has a significant advantage over the iteration steps and computing time, especially when the coefficient matrix columns are highly correlated.
The organization of this paper is as follows. In Section 2, some notation and lemmas are introduced. In Section 3, we propose the GRGSO method for solving (1) and give its convergence analysis. Some examples are used to demonstrate the competitiveness of our proposed method in Section 4. Finally, we draw some brief conclusions in Section 5.
2. Notion and Preliminaries
In the beginning of this section, we give some notation. For a Hermitian positive definite matrix
and a column vector
with appropriate dimension, we denote
and
the ith entry of
. For a given matrix
,
and
denote the smallest nonzero singular value and the Frobenius norm of matrix
, respectively. Let
and
, then
represents jth entry of
.
is the optimal solution of the corresponding problem. We indicate by
the expected value conditional on the first k iterations, that is,
where
is the column selected at the t-th iteration.
In the following, we give a basic lemma.
Lemma 1 (See Bai and Wu [11] ) If the vector
is in the column space of
, it holds
3. GRGSO Method
In this section, we design the GRGSO method for solving (1), by combining the oblique direction with the GRGS method. The pseudo-code of GRGSO method is listed in Table 1. The difference between the RGSO method and GRGSO method is the selection strategy. The RGSO method utilizes the random selection strategy. Specially, the RGSO method selects
th column with probability
in the numerical experiments, which can be equivalent to the uniform sampling if the Euclidean norms of all the columns of the matrix
are same; while the GRGSO method aims to grasp the larger entries of the residual vector at each iteration. Compared with the GRGS method, our proposed method considers the oblique projection, which is expected to have better convergence performance in some cases of coefficient matrix columns being highly correlated.
Remark Let
, which implies
. Therefore, for all iterative step k, the index set
generated by the GRGSO method is nonempty.
Remark In the GRGSO method, it holds that
(4)
and
(5)
Therefore, the GRGSO method can be executed more effectively if
can be computed in an economical manner at the beginning.
Next, we give some lemmas which are useful to analyze the convergence of the GRGSO method.
Lemma 2 For the GRGSO method, we have
(6)
(7)
Proof. For
, one has
(8)
For
, we have
which together with (8) proves (6).
By (6), it follows for
that
which leads to (7).
Remark From (6) and (7), it is obvious that in kth iteration, the GRGSO method dose not select
,
, which means
. Thus, the direction
can be the combination of two unit coordinate directions. This is also the advantage of the GRGSO method compared with the RGSO method. Since the RGSO method randomly selects
in the kth iteration, which can not avoid selecting
and
while the GRGSO method can avoid.
Lemma 3 For
in the GRGSO method, it satisfies
where
.
Proof. Since
, it holds that
and
Hence, we complete this proof.
Lemma 4 The iteration sequence
generated by the GRGSO method satisfies
Proof. For
, we have
This means that the vector
is perpendicular to the vector
. Since
is parallel to
, the vector
is perpendicular to
.
For
, It follows from Lemma 3 and Lemma 2 that
which means that the vector
is perpendicular to the vector
. Since
is parallel to
, the vector
is perpendicular to
. For all
, it follows that
which together with Pythagoras theorem leads to the desired result.
Next, the convergence theory of the GRGSO method is deduced.
Theorem 5 For the least squares problem (1), the iteration sequence
generated by the GRGSO method from any initial guess
satisfies
(9)
where
,
,
(
). Here
,
and
is given as in Lemma 3.
Proof. By Lemma 2, it follows for
that
(10)
For
, it follows from (6) that
(11)
By Lemma 4, Lemma 3 and Lemma 1, for
, we have
which together with (11) and (10) can lead to
(12)
and
(13)
respectively. For
, we can similarly get
Then taking the full expectation on both sides of (12) and (13) and by induction on the iteration index k, we can easily obtain (9). Thus, we complete the proof.
Remark Suppose that the upper bound for convergence rate of the GRGS method [3] is defined as
Since
,
and
, we have
This implies that the upper bound on the convergence factor of the GRGSO method is smaller, uniformly with respect to the iterative step k, than that of the GRCD method.
4. Numerical Experiments
Some examples are designed in this section to verify the effectiveness of the GRGSO method. Specially, the GRGSO method is compared with the GRGS method [3] , RGS method [2] and RGSO method [8] . We list the tested results of these methods in terms of the number of iteration steps (denoted by “IT”) and the running time in seconds (denoted by “CPU”).
The coefficient matrix
in numerical experiments is from two sources. One is the random matrix whose entries are randomly taken from the interval
by using the function rand in MATLAB. c being close to 1 implies that the matrix columns are highly correlated. Another is sparse matrices from the literature [12] listed in Table 2. We use
to represent the condition number for
, and define the density as
We randomly generate the true vector
by utilizing the MATLAB function randn and construct the vector y by
when the linear system is consistent; while for the inconsistent linear systems the right-hand side is set by
, where
. We take zero vector for the initial approximation of each iteration process. Since
which was also used in the work of Wang et al. [8] , we terminate the iteration process once
We set “-” in the numerical tables if the number of iteration steps exceeds 300,000. All the results are averages from 20 repetitions. All experiments were implemented by using MATLAB (R2021b) on a computer with 2.30 GHz central processing unit (Intel(R) Core(TM) i7-10875H CPU), 16 GB memory.
![]()
Table 2. Sparse matrix properties of realistic problems [12] .
For the randomly generated matrices, with
, the numerical results for consistent and inconsistent linear systems are listed in Table 3 and Table 4, respectively. It is easy to observe from Table 3 and Table 4 that the GRGSO method significantly outperforms the RGS, GRGS and RGSO methods in terms of both IT and CPU.
In the following, we compare these methods for solving (1) when the randomly generated matrix is with different c. We list the numerical results for the consistent and inconsistent systems in Table 5 and Table 6, respectively. From Table 5 and Table 6, it is easy to observe that the IT and CPU of the RGS and GRGS methods increase significantly with c increasing closer to 1. When c increases to 0.8, the IT of the RGS method exceeds the maximal number of iteration steps. And when c increases to 0.9, the IT of the GRGS method exceeds the maximal number of iteration steps. For the different c values, the methods with the oblique direction outperform the methods without the oblique direction. In addition, compared with the other methods, the GRGSO method performs best in terms of both IT and CPU.
For the full-rank sparse matrices from literature [12] , the numerical results for the consistent and inconsistent linear systems are listed in Table 7 and Table 8, respectively. It can be seen that for the matrix abtaha1 the performance of the GRGSO method is similar to that of the GRGS method in terms of both IT and CPU, whereas for the other matrices the GRGSO method significantly performs better in terms of both IT and CPU than the other methods.
![]()
Table 3. The consistent system with
: different m impacts on IT and CPU.
![]()
Table 4. The inconsistent system with
: different m impacts on IT and CPU.
![]()
Table 5. The consistent system with
: different c impacts on IT and CPU.
![]()
Table 6. The inconsistent system with
: different c impacts on IT and CPU.
![]()
Table 7. The consistent system: IT and CPU time of test methods for different sparse matrices.
![]()
Table 8. The inconsistent system: IT and CPU time of test methods for different sparse matrices.
5. Conclusion
In this manuscript, we construct the GRGSO method for the linear least squares problem. We have established the convergence analyses of the GRGSO method. Numerical experiments show that the GRGSO method is superior to the RGS, GRGS and RGSO methods in terms of both IT and CPU, especially when the coefficient matrix columns are highly correlated. It is natural to generalize the GRGSO method by introducing a relaxation parameter in its probability criterion. However, the choice of the optimal relaxation parameter is difficult in theory up to now. How to find the optimal relaxation parameter in theory is worthy of further study.