A Scaled Conjugate Gradient Method Based on New BFGS Secant Equation with Modified Nonmonotone Line Search ()
1. Introduction
The conjugate gradient method (CG) and Quasi-Newton method are two major popular iterative methods for solving smooth unconstrained optimization problems, and Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is one of the most efficient quasi-Newton methods for solving small and medium sized unconstrained optimization problems [1] [2] [3] [4]. The iterative method is computed by
(1)
where
is a step size and
is a search direction. For continuously differentiable function
, the minimization problem:
(2)
has been well studied for several decades. Conjugate gradient method is among the preferable methods for solving problem (2) with search direction
given by
(3)
where
is the gradient of an objective function
at k iterate and
is a scalar describing the attributes of the CG methods.
Some well-known formulas for the scalar
are the Hestenes-Stiefel (HS) [5], Fletcher-Reeves (FR) [6], Polak-Ribière and Polak (PRP) [7], and Dai-Yuan (DY) [8] given by
where
and
denotes the Euclidean norm. Due to their simplicity and low memory requirement, CG methods are more effective and desirable for large scale unconstrained smooth problems [9] [10]. The global convergence properties of nonlinear CG methods have been analyzed under the weak Wolfe line search
(4)
and the strong Wolfe line search:
(5)
where
. CG methods use relatively little memory for large scale problems and require no numerical linear algebra, so each step is quite fast. However, they do not have second order information of the objective function, and typically converge much more slowly than Newton or quasi-Newton methods.
The quasi-Newton method is an iterative method with second order information of the objective function, and BFGS is the effective quasi-Newton method with the search direction
(6)
where
is an approximation of the Hessian matrix of h at
. The update formula for
is defined by
(7)
where
is defined as
, and the Hessian approximation
of (7) satisfies the standard secant equation
(8)
if
, which is known as the curvature condition. The BFGS method has very interesting properties and remains one of the most respectable quasi-Newton methods for unconstrained optimization [11]. The theory of BFGS method and its global convergence have been established by many researchers (see [12]). For convex objective function, using some special inexact line search, it has been proved that the BFGS method is globally convergent (see [13] [14] [15]). However, when the objective function is nonconvex, the BFGS method under exact line search may fail to converge [16]. Moreover, Dai [17] proved that the BFGS method may fail for nonconvex functions with Wolfe line search techniques given in (4) and (5) [18]. Wolfe line search technique is the most common monotone line search technique, and it may leads to small steps without making significant progress to the minimum when the contours of the objective functions are a family of curves with large curvature (see [19] [20] [21] [22]). In order to overcome this drawback, the first nonmonotone line search technique was proposed by Grippo et al. [19] for Newton’s method. With this initiative, many nonmonotone line search techniques have been proposed in recent years [23]. Yuan et al. [24] developed a modified limited memory BFGS method with the update formula that has a higher order approximation to exact Hessian, and its convergence property is analyzed under the nonmonotone line search type. However, the method converges for only uniformly convex functions. Li et al. [25] proposed a new BFGS algorithm with modified secant equation which achieves both global and superlinear convergence for generally convex functions under the nonmonotone line search of [19]. Su and Rong [26] introduced and established a new spectral CG method and its implementation under a modified nonmonotone line search technique. They introduced a new spectral conjugate gradient direction
(9)
where
(10)
(11)
and
. It is not difficult to notice that the denominator of (11) is the convex combination of the denominator of the conjugate parameters in HS and PRP conjugate gradient methods. The choice of spectral parameter given (10) ensures the sufficient descent property of the search direction without dependence of line search. The convergence property of their method analyzed under a new modified nonmonotone line search with some mild conditions. However, this spectral CG method has only first order information, and excludes second order information. When the number of dimension is large, the CG methods are more effective compared to the BFGS methods in term of the CPU-time but in term of the number of iterations and the number of function evaluations, the BFGS methods are better. In order to incorporate the remarkable properties of the CG and BFGS methods and to overcome their drawbacks, many hybrid of CG and BFGS methods are introduced for unconstrained smooth optimization [27] [28] [29]. However, the usage of these methods is mainly restricted to solve smooth optimization problems. Recently, Yuan et al. [30] [31] [32] [33] introduced some CG approaches to solve nonsmooth convex large scale problems using the smoothing regularization, and under some assumptions, the global convergence properties of these approaches are analyzed. Yuan and Wei [34] proposed the Barzilai and Borwein (BB) gradient method with nonmonotone line search to solve nonsmooth convex optimization problems. Some implementable quasi-Newton methods are also introduced for solving the same problem (see [35] [36] [37] [38]). More recently, Ou and Zhou [39] introduced a modified scaled BFGS preconditioned CG algorithm, and under appropriate assumptions, the method is proven to possess global convergence for nonsmooth convex functions.
Motivated by the work of Ou and Zhou [39], in this paper, we propose a hybrid approach of the a scaled CG method and a modified BFGS method to combine the simplicity of CG method and the Hessian approximation of BFGS method. Our work is mainly focused in developing the scaled conjugate search direction that includes the second order information of the objective function by incorporating the modified secant equation of BFGS method. Opposing the work of Ou and Zhou [39], our method has both the function and gradient value information of the objective function. Moreover, our method leads to better descent direction than the CG methods proposed so far. To the best of our knowledge, this is the first work to incorporate the scaled CG algorithm with the BFGS secant equation which contains both the function and gradient value information of the objective function for solving large scale nonsmooth optimization. Under the new modified nonmonotone line search technique, the global convergence of the algorithm is analyzed for nonsmooth convex problems.
The paper is organized as follows. In the next section, we consider a nonsmooth convex problem and review their basic results. In Section 3, we propose a new scaled CG algorithm that incorporates the BFGS secant equation which has both function value and gradient information of the objective function via the smoothing regularization. Using the new modified nonmonotone line search technique, we prove the global convergence of our new algorithm for nonsmooth convex problems. Numerical results and related comparisons are reported in Section 4. Finally, Section 5 concludes our work.
2. Nonsmooth Convex Problems and Their Basic Results
In this section, we consider the unconstrained optimization problem
(12)
where
is a possibly nonsmooth convex function. This problem is equivalent to the following problem
(13)
where
is the Moreau-Yosida regularization of f [40], which is defined by
(14)
where
is a positive parameter. The function F is a finite-valued, continuously differentiable convex function even though the function f is nondifferentiable (see [40]). Let
be the unique solution of (14). In what follows, we can express
as
(15)
Moreover, the gradient of F is globally Lipschitz continuous, i.e.,
(16)
where
(17)
The point
is an optimal solution to (12) if and only if
(see [40]). Furthermore, under reasonable conditions the gradient of F is semismooth and some of its remarkable properties are given in [41] [42].
Several methods have been proposed to solve (13) by incorporating bundle methods and quasi-Newton methods ideas [43] [44] [45], but it is burdensome to evaluate the exact value of
at any given point x [46]. Luckily, for each
and any
, we can have
such that
(18)
Therefore, we can approximate
and
by
(19)
and
(20)
respectively. Implementable algorithms to define such a
for nonsmooth convex model can be seen in [47]. The noticeable attributes of
and
by the following proposition [48].
Proposition 1. Let
be a vector that satisfies (18), and let
and
be defined by (19) and (20), respectively. Then we obtain
(21)
(22)
and
(23)
Proposition 1 shows that the approximations of
and
can be made arbitrarily close to the exact values of
and
respectively.
3. A Scaled CG Method Based on New BFGS Secant Equation
In this section, we introduce the new scaled CG search direction that incorporates the modified BFGS secant equation, and then describe the new algorithm for solving nonsmooth problems. We make use of a modified nonmonotone line search technique introduced by [23] to compute a step size. Based on the above approximations, we redefine the search direction of CG method (3) to solve problem (13) as follows:
(24)
where
is an appropriately chosen positive number. Ou and Zhou [39] provided a search direction defined by
(25)
where
is defined
(26)
with
where
(27)
The vector
and
in (27) are defined as
(28)
and
(29)
It is easy to observe that the (27) has only gradient value information. In order to have both gradient and function value information, we replace (27) and (29) by
(30)
and
(31)
respectively. Thus, the BFGS method with the secant equation
(32)
and the update formula
(33)
has both gradient and function value information, and the matrix
inherits the positive definiteness of
for generally convex functions. Using the secant Equation (32), we propose the new search direction is defined by
(34)
where
(35)
(36)
and
(37)
Now, based on the above search direction, we describe our new scaled CG algorithm with a modified nonmonotone line search for solving problem (13) as follows.
Algorithm 1
Step 0. Given
, and a point
. Set
and
.
Step 1. If
, then stop, else go to the next step.
Step 2. Compute the search direction
by using (34)-(37).
Step 3. Set trial step size
.
Step 4. Set
and choose a scalar
such that
.
Step 5. Let
,
is a positive integer, define
, and choose
Let
be bounded above and satisfy:
(38)
If (38) does not holds, define
and go to step 5.
Step 6. Set K := k + 1 and go to step 1.
It can be observed that the line search technique in step 5 of Algorithm 1 is a nonmonotone line search technique with some modifications.
Convergence Analysis
In this subsection, we establish the global convergence of our method for nonsmooth convex problem (12). To prove the global convergence of Algorithm 1, the following Lemmas are needed.
Lemma 1. Assume that the search direction
is generated by Algorithm 1, then for all
, we have
(39)
and
(40)
Proof. If
, then
and
Let
, then from (34) we have
Once more, (34) yields that
Thus, the proof is completed.
Lemma 1 shows that the search direction
developed in (34)-(37) leads to the most sufficiently descent direction and it belongs to a trust region.
Lemma 2. Let the step size
satisfy (38), then there exist
satisfy a
(41)
Proof. If
satisfies the formula (38), then the proof is completed. Otherwise, there exist
such that
Thus,
(42)
Using mean value theorem, we have
Combining the above inequality with (42), we have
Thus, the proof is completed.
Lemma 3. Assume that the sequence
is generated by Algorithm 1, then we have
Proof. We prove this lemma by induction. For
, by (38) and
, we have
Assume the equation holds for
, and we need to show for
. To show the condition, we have considered two cases.
Case 1:
Then, from (38), we have
Case 2:
let
. Then, again from (38),
Thus, by imposing
and
we have
Theorem 1. Assume that the sequences
and
are generated by Algorithm 1. Let F is bounded below on the level set
and
Then
(43)
Proof. Suppose that (43) is not true. Then there exist constants
and
such that
(44)
From Lemma 3, we have
(45)
By (40), (41) and (44), we have
Letting
, we have
and this contradicts our assumption on F. Hence the theorem is proved.
Theorem 2. Let the conditions in Lemma 1 and Theorem 1 hold, then Algorithm 1 converges for nonsmooth problem (12).
Proof. From Lemma 1 and Theorem 1, we have
Then,
(46)
Thus, (23) and convergence of sequence
yield
Hence,
(47)
Let
be an accumulation point of
. Then there exists a subsequence
satisfying
(48)
Thus, (17), (43) and (47) yield
. Therefore
is an optimal solution of nonsmooth problem (12).
4. Numerical Experiments for Large Scale Nonsmooth Problems
In this section, we present some numerical experiments to examine the efficiency of Algorithm 1 for some large scale nonsmooth academic test problems which are introduced in [49]. The details of these large scale nonsmooth academic test problems with their initial points
and the minimum values
are listed as follows:
Problem 1
for
and
for
Problem 2
for
.
Problem 3
for
;
Problem 4
for
;
Problem 5
for
;
Problem 6
,
where
;
for
;
Problem 7
when
and
when
;
Problem 8
for
;
Problem 9
when
and
when
;
Problem 10
when
and
when
;
The problems 1 - 5 are convex functions, and the others are nonconvex functions. We test the above problems with the dimension of
,
,
,
,
,
,
,
,
and
. For convenience sake, we denote Algorithm 1 by scaled conjugate gradient method based on modified secant equation of BFGS method (SCG-MBFGS), and in order to demonstrate validity of our algorithm, we also list the results of other three algorithms MPRP in [30], MHS in [31] and MSBFGS-CG in [39]. All algorithms were implemented in Fortran 90 and run on a PC with an intel(R) Core(TM)i3-3110M CPU at 2.40 GHz, 4.00 GB of RAM, and the Windows 7 operating system. We stopped the iteration when the condition
was satisfied. The parameters for SCG-MBFGS were chosen as
. All parameters for other three methods are chosen as in [30] [31] and [39] respectively. Table 1 shows the numerical results of SCG-MBFGS, MPRP, MHS and MSBFGS-CG on the given test problems. The columns in Table 1 have the following meanings:
Dim: the dimensions of problem.
NI: the total number of iterations.
NF: the number of function evaluations.
TIME: the CPU time in seconds.
: the value of
at the final iteration.
From the numerical results in Table 1, it is not difficult to see that
Table 1. Numerical results for 10 problems with given initial points and dimensions.
SCG-MBFGS is superior or competitive to the other three methods in solving the given problems in terms of number of iteration, number of function evaluations and CPU time. Furthermore, to directly illustrate the performances of our method, we employed the tool provided by Dolan and Moré [50] to analyze and compare the efficiency of the method in terms of the number of iterations, number of function evaluations and CPU time. Figures 1-3 represent the computational performance profiles of the above algorithms regarding the number of iterations, number of function evaluations and CPU time respectively. From the 3 figures, we can observe that for the given test problems, SCG-MBFGS is competitive or superior to other three methods in terms of number of iteration, function evaluations and CPU time respectively.
From Figure 1 and Figure 2, we also notice that SCG-MBFGS performs better than the other methods do in terms of the numbers of iterations and function evaluations. Figure 3 indicates that MHS is comparable to SCG-MBFGS in terms of CPU time, and since the search direction of MHS is developed with only first order information while SCG-MBFGS, MPRP and MSBFGS-CG are with second order information, it is reasonable to need less CPU time for MHS.
Figure 1. Performance profiles of these three methods based on NI.
Figure 2. Performance profiles of these three methods based on NF.
Figure 3. Performance profiles of these three methods based on CPUTIME.
5. Conclusion
In this paper, we propose a new scaled conjugate gradient method which incorporates a modified secant equation of BFGS method. This modified secant equation contains both function value and gradient information of the objective function, and its Hessian approximation update generates positive definite matrix. Under a modified nonmonotone line search and some mild conditions, the strong global convergence of the proposed method is analyzed for nonsmooth convex problems. The search direction of our new method generates sufficiently descent condition and belongs to a trust region. Compared with existing nonsmooth CG methods, the search direction of our approach is more descent direction. Numerical results and related comparisons show that the proposed method is effective for solving large scale nonsmooth optimization problems.
Acknowledgements
The authors would like to thank the reviewers and editor for their valuable comments which greatly improve our paper. This work is supported by the National Natural Science Foundation of China [Grant No. 11771003].