Global Convergence Property with Inexact Line Search for a New Hybrid Conjugate Gradient Method

In this study, we derive a new scale parameter φ for the CG method, for solving large scale unconstrained optimization algorithms. The new scale parameter φ satisfies the sufficient descent condition, global convergence analysis proved under Strong Wolfe line search conditions. Our numerical results show that the proposed method is effective and robust against some known algorithms.

Keywords

Share and Cite:

Al-Namat, F. and Al-Naemi, G. (2020) Global Convergence Property with Inexact Line Search for a New Hybrid Conjugate Gradient Method. Open Access Library Journal, 7, 1-14. doi: 10.4236/oalib.1106048. 1. Introduction

In unconstrained optimization, we minimize an objective function that depends on real variables with no restrictions at all on the value of these variables. The unconstrained optimization problem is stated by:

${\mathrm{min}}_{x\in {R}^{n}}f\left(x\right)$ (1)

where $x\in {R}^{n}$ is a real vector with $n\ge 1$ component and $f:{R}^{n}\to R$ is a smooth function and its gradient g is available . A nonlinear conjugate gradient method generates a sequence ${x}_{k}$ Starting from an initial guess ${x}_{0}\in {R}^{n}$ Using the recurrence

${x}_{k+1}={x}_{k}+{\alpha }_{k}{d}_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=0,1,2,\cdots$ (2)

where ${\alpha }_{k}$ is the positive step size obtained by carrying out a one dimensional search, known as the line searches . Among them, the so-called strong wolf line search conditions require that  .

$f\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)\le f\left({x}_{k}\right)+\sigma {\alpha }_{k}{d}_{k}$, (3)

$|g\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)|\le \delta |{g}_{k}^{\text{T}}{d}_{k}|$ (4)

where $0<\sigma <\delta <\text{1}$, is to find an approximation of ${\alpha }_{k}$ where the descent property must be satisfied and no longer searching in the direction when ${x}_{k}$ is far from the solution. Thus by strong Wolfe line search conditions we in herit the advantages of exact line search with inexpensive and low computational cost .

The search direction ${d}_{k}$ is generated by:

${d}_{k}=\left\{\begin{array}{l}-{g}_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=1\\ -{g}_{k}+{\beta }_{k}{d}_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k>1\end{array}$ (5)

where ${g}_{k}$ and ${\beta }_{k}$ is the gradient and conjugate gradient coefficient of f(x) respectively at the point ${x}_{k}$. The different choices for the parameter ${\beta }_{k}$ correspond to different conjugate gradient methods. The most popular formulas for ${\beta }_{k}$ is Hestenes Stiefel method (HS), Fletcher-Reeves method (FR), Polak-Ribiere- Polyak method (PR), conjugate―Descent method (CD), Liu―Storey method (LS), and Dai-Yuan method (DY), etc

These methods are identical when f is a strongly convex quadratic function and the line search is exact, since the gradient are mutually orthogonal, and the parameters ${\beta }_{k}$ in these methods are equal. When applied to general nonlinear function with inexact line searches, however, the behavior of these methods is marked different . We are going to summarize some well known conjugate gradient method in Table 1.

An important class of conjugate gradient methods is the hybrid conjugate gradient algorithms. The hybrid computational schemes perform better than the classical conjugate gradient methods. They are defined by (2) and (5) where the parameter ${\beta }_{k}$ is computed as projections or as convex combinations of different conjugate gradient methods .

We are going to summarize some well known hybrid conjugate gradient method in Table 2.

We propose a new hybrid CG method based on combination of MMWU  and RMAR  conjugate gradient methods for solving unconstrained optimization method with suitable conditions. The corresponding conjugate gradient parameters are

${B}_{k}^{MMWU}=\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}$ (6)

and

${\beta }_{k}^{RMAR}=\frac{{‖{g}_{k+1}‖}^{2}-\frac{‖{g}_{k+1}‖}{‖{d}_{k}‖}{g}_{k+1}^{\text{T}}{d}_{k}}{{‖{d}_{k}‖}^{2}}$ (7)

Table 1. Some well known conjugate gradient coefficients.

Table 2. Hybrid conjugate gradient methods.

We defined the parameter ${\beta }_{k}$ in the proposed method by:

${\beta }_{k}^{FG}=\left(1-{\phi }_{k}\right){\beta }_{k}^{MMWU}+{\phi }_{k}{\beta }_{k}^{RMAR}$ (8)

Observe that if ${\phi }_{k}=0$, then ${\beta }_{k}^{FG}={\beta }_{k}^{MMWU}$, and if ${\phi }_{k}=1$, then ${\beta }_{k}^{FG}={\beta }_{k}^{RMAR}$.

By choosing the appropriate value of the parameter ${\phi }_{k}$ In the convex combination, the search direction ${d}_{k}$ of our algorithm not only is the Newton direction, but also satisfies the famous DL conjugate condition proposed by Dai and Liao . Under the strong Wolfe line search conditions, we prove the global convergence of our algorithm. The numerical results also show the feasibility and effectiveness of our algorithm.

This paper is organized as follows. Section 2 we introduce our new hybrid conjugate gradient method (HFG), and we obtain the parameter ${\phi }_{k}$ using some approaches and give us a specific algorithm. Section 3, we prove that it generates direction satisfying the sufficient descent condition under strong Wolfe line search conditions. The global convergence property of the proposed method is established in Section 4. Some numerical results are reported in Section 5.

2. A New Hybrid Conjugate Gradient Method

In this section, we will describe a new proposed hybrid conjugate gradient method. In order to obtain the sufficient descent direction, we will compute ${\phi }_{k}$ as follows. We combine ${\beta }_{k}^{MMWU}$ and ${\beta }_{k}^{RMAR}$ in a convex combination in order to have a good algorithm for unconstrained optimization.

The direction ${d}_{k+1}$ is generated by the rule

${d}_{k+1}=-{g}_{k+1}+{\beta }_{k}^{HFG}{d}_{k}$ (9)

where ${\beta }_{k}^{HFG}$ defined in (8), the iterates ${x}_{1},{x}_{2},{x}_{3},\cdots$ of our method are computed by means of the recurrence (2), where the step size ${\alpha }_{k}$ Is determined according to the strong Wolf conditions (3) and (4).

The scale parameter ${\phi }_{k}$ satisfying $0\le {\phi }_{k}\le 1$, which will be determined in a specific way to be described later. Observe that if ${\phi }_{k}=0$, then ${\beta }_{k}^{HFG}={\beta }_{k}^{MMWU}$, and

If ${\phi }_{k}=1$, then ${\beta }_{k}^{HFG}={\beta }_{k}^{RMAR}$. On the other hand, if $0<{\phi }_{k}<1$, then ${\beta }_{k}^{HFG}$ is a convex combination of ${\beta }_{k}^{MMWU}$ and ${\beta }_{k}^{RMAR}$.

From (8) and (9) it is obvious that:

${d}_{k+1}=\left\{\begin{array}{l}-{g}_{k+1},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=1\\ -{g}_{k+1}+\left(1-{\phi }_{k}\right)\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{d}_{k}+{\phi }_{k}\frac{{‖{g}_{k+1}‖}^{2}-\frac{‖{g}_{k+1}‖}{‖{d}_{k}‖}{g}_{k+1}^{\text{T}}{d}_{k}}{{‖{d}_{k}‖}^{2}}{d}_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }k>1\end{array}$, (10)

Our motivation to select the parameter ${\phi }_{k}$ in such a manner that the defection ${d}_{k+1}$ given in (10) is equal to the Newton direction ${d}_{k+1}^{N}=-{\nabla }^{2}f{\left({x}_{k+1}\right)}^{-1}{g}_{k+1}$. There for

$\begin{array}{l}-{\nabla }^{2}f{\left({x}_{k+1}\right)}^{-1}{g}_{k+1}\\ =-{g}_{k+1}+\left(1-{\phi }_{k}\right)\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{d}_{k}+{\phi }_{k}\frac{{‖{g}_{k+1}‖}^{2}-\frac{‖{g}_{k+1}‖}{‖{d}_{k}‖}{g}_{k+1}^{\text{T}}{d}_{k}}{{‖{d}_{k}‖}^{2}}{d}_{k}\end{array}$ (11)

Now multiplying (11) by ${s}_{k}^{\text{T}}{\nabla }^{2}f\left({x}_{k+1}\right)$ from the left, we get

$\begin{array}{c}-{s}_{k}^{\text{T}}{g}_{k+1}=-{s}_{K}^{\text{T}}{\nabla }^{2}f\left({x}_{k+1}\right){g}_{k+1}+\left(1-{\phi }_{k}\right)\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{s}_{K}^{\text{T}}{\nabla }^{2}f\left({x}_{k+1}\right){d}_{k}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\phi }_{k}\frac{{‖{g}_{k+1}‖}^{2}-\frac{‖{g}_{k+1}‖}{‖{d}_{k}‖}{g}_{k+1}^{\text{T}}{d}_{k}}{{‖{d}_{k}‖}^{2}}{s}_{k}^{\text{T}}{\nabla }^{2}f\left({x}_{k+1}\right){d}_{k}\end{array}$

Therefore, in order to have an algorithm for solving large scale problems we assume that pair $\left({s}_{k},{y}_{k}\right)$ satisfies the secant equation

${\nabla }^{2}f\left({x}_{k+1}\right){s}_{k}={y}_{k}.$ (12)

From (12), we get

${s}_{k}^{\text{T}}{\nabla }^{2}f\left({x}_{k+1}\right)={y}_{k}^{\text{T}}.$

Denoting ${\phi }_{k}^{FG}={\phi }_{k}$ we get

$-{s}_{k}^{\text{T}}{g}_{k+1}=-{y}_{K}^{\text{T}}{g}_{k+1}+\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{y}_{k}^{\text{T}}{d}_{k}+{\phi }_{k}^{FG}\left(\frac{‖{g}_{k+1}‖\left({g}_{k+1}^{\text{T}}{d}_{k}\right)}{{‖{d}_{k}‖}^{2}}\right)\left({y}_{k}^{\text{T}}{d}_{k}\right)$

after some algebra, we get

${\phi }_{k}^{FG}=\frac{\left({s}_{k}^{\text{T}}{g}_{k+1}-{y}_{k}^{\text{T}}{g}_{k+1}\right)\cdot {‖{d}_{k}‖}^{3}+{‖{g}_{k+1}‖}^{2}\cdot ‖{d}_{k}‖\left({y}_{k}^{\text{T}}{d}_{k}\right)}{‖{g}_{k+1}‖\cdot \left({g}_{k+1}^{\text{T}}{d}_{k}\right)\cdot \left({y}_{k}^{\text{T}}{d}_{k}\right)}$ (13)

Now, we specify a complete hybrid conjugate gradient method (HFG) which posses some nice properties of conjugate gradient and Newton method.

Algorithm HFG

Step 1: Select ${x}_{0}\in {R}^{n}$, $\in \text{\hspace{0.17em}}>0$, set $k=0$. Compute $f\left({x}_{0}\right)$ and ${g}_{0}=-\nabla f\left({x}_{0}\right)$, set ${d}_{0}=-{g}_{0}$.

Step 2: Test the stopping criteria, i.e. if $‖{g}_{k}‖\le \text{\hspace{0.17em}}\in$, then stop.

Step 3: Compute ${\alpha }_{k}$ by strong Wolfe line search conditions in (3) & (4).

Step 4: Compute ${x}_{k+1}={x}_{k}+{\alpha }_{k}{d}_{k}$, ${g}_{k+1}=g\left({x}_{k+1}\right)$. Compute ${s}_{k}={x}_{k+1}-{x}_{k}$ And ${y}_{k}={g}_{k+1}-{g}_{k}$

Step 5: If ${\phi }_{k}\ge 1$ then set ${\phi }_{k}=1$. If ${\phi }_{k}\le 0$, then set ${\phi }_{k}=0$, otherwise compute ${\phi }_{k}$ as (13).

Step 6: Compute ${\beta }_{k}^{FG}$ by (8).

Step 7: Generate $d=-{g}_{k+1}+{\beta }_{k}^{FG}{d}_{k}$

Step 8: If the restart criteria of Powell $|{g}_{k}^{\text{T}}{g}_{k}|\ge 0.2{‖{g}_{k+1}‖}^{2}$, is satisfied, then set ${d}_{k}=-{g}_{k+1}$, otherwise define ${d}_{k+1}=d$

Step 9: Set $k=k+1$, and continue with step 2.

3. The Sufficient Descent Condition

In this section, we are going to apply the following theorem to illustrate that the search direction ${d}_{k}$ Obtained by hybrid FG satisfies the sufficient descent condition which plays Avit of role in analyzing the global convergence.

For further considerations we need the following assumptions

3.1. Assumption

The level sets $S=\left\{x\in {R}^{n},f\left({x}_{n}\right)\right\}$ are bounded.

3.2. Assumption

In a neighborhood N of S, the function f is continuously differentiable and its gradient is Lipschitz continuous, i.e., there exists a constant $L>0$, such that

$‖\nabla f\left(x\right)-\nabla f\left(y\right)‖\le L‖x-y‖,\text{\hspace{0.17em}}\forall x,y\in N$

Under these assumptions of if there exists a positive constant ( $\gamma ,\stackrel{¯}{\gamma },\omega$ & $\stackrel{¯}{\omega }$ ) & such that

$\stackrel{¯}{\gamma }\le ‖{g}_{k+1}‖\le \gamma \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{¯}{\omega }\le ‖{g}_{k}‖\le \omega ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall x\in S$ .

Theorem.

Let the sequences $\left\{{g}_{k}\right\}$ and $\left\{{d}_{k}\right\}$ be generated by a hybrid FG method. Then the search direction ${d}_{k}$ satisfies the sufficient descent condition:

${g}_{k+1}^{\text{T}}{d}_{k+1}\le -\mu {‖{g}_{k+1}‖}^{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall \mu \ge 0$ (14)

where $\mu =1-\left({E}_{4}-{E}_{3}\right)$, with $0<\left({E}_{4}-{E}_{3}\right)<1$.

Proof. We shall show that ${d}_{k}$ satisfies the sufficient descent condition holds for $k=0$, the proof is a trivial one, i.e. ${d}_{0}=-{g}_{0}$ and so ${g}_{0}^{\text{T}}{d}_{0}=-{‖{g}_{0}‖}^{2}$. Now we have

${d}_{k+1}=-{g}_{k+1}+{\beta }_{k}^{FG}{d}_{k},$

i.e.

${d}_{k+1}=-{g}_{k+1}+\left[\left(1-{\phi }_{k}\right){\beta }_{k}^{MMWU}+{\phi }_{k}{\beta }_{k}^{RMAR}\right]{d}_{k}$

We can rewrite the above direction by the following manner:

${d}_{k+1}=-\left({\phi }_{k}{g}_{k+1}+\left(1-{\phi }_{k}\right){g}_{k+1}\right)+\left(\left(1-{\phi }_{k}\right){\beta }_{k}^{MMWU}+{\phi }_{k}{\beta }_{k}^{RMAR}\right){d}_{k}$.

So,

${d}_{k+1}={\phi }_{k}\left(-{g}_{k+1}+{\beta }_{k}^{RMAR}{d}_{k}\right)+\left(1-{\phi }_{k}\right)\left(-{g}_{k}+{\beta }_{k}^{MMWU}{d}_{k}\right)$,

After some arrangement, we get

${d}_{k+1}={\phi }_{k}{d}_{k+1}^{RMAR}+\left(1-{\phi }_{k}\right){d}_{k+1}^{MMWU}$ (15)

Multiplying (15) by ${g}_{k+1}^{\text{T}}$ from the left, we get

${g}_{k+1}^{\text{T}}{d}_{k+1}={\phi }_{k}{g}_{k+1}^{\text{T}}{d}_{k+1}^{RMAR}+\left(1-{\phi }_{k}\right){g}_{k+1}^{\text{T}}{d}_{k}^{MMWU}$

Firstly, if ${\phi }_{k}=0$, then ${d}_{k+1}={d}_{k+1}^{MMWU}$, we are going to prove that the sufficient descent condition holds for MMWU method in the presence of the strong Wolfe line search condition, because in  they proved this method satisfied the sufficient descent condition with exact line search.

i.e.

${g}_{k+1}^{\text{T}}{d}_{k+1}^{MMWU}=-{‖{g}_{k+1}‖}^{2}+\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{g}_{k+1}^{\text{T}}{d}_{k}$ (16)

Since,

${g}_{k+1}^{\text{T}}{d}_{k}\le {y}_{k}^{\text{T}}{d}_{k}$ and ${y}_{k}^{\text{T}}{d}_{k}\le {\alpha }_{k}L{‖{d}_{k}‖}^{2}$ (17)

Applications (17) in (16), we get

$\begin{array}{c}{g}_{k+1}^{\text{T}}{d}_{k+1}^{MMWU}\le -{‖{g}_{k+1}‖}^{2}+\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{\alpha }_{k}L{‖{d}_{k}‖}^{2}\\ =-\left(1-{\alpha }_{k}L\right){‖{g}_{k+1}‖}^{2}\\ =-{E}_{1}{‖{g}_{k+1}‖}^{2}\end{array}$ (18)

where ${E}_{1}=\left(1-{\alpha }_{k}L\right)>0$, with $0<{\alpha }_{k}L<1$.

So, it is proved that ${d}_{k+1}^{MMWU}$ satisfies the sufficient descent condition.

Now let ${\phi }_{k}=1$ then ${d}_{k}={d}_{k}^{RMAR}$, we are going to prove that the sufficient descent condition holds for RMAR method in the presence of the strong Wolfe line search condition because in  they proved this method satisfied the sufficient descent condition with exact line search.

${d}_{k+1}^{RMAR}=-{g}_{k+1}+{\beta }_{k}^{RMAR}{d}_{k}$

Multiplying the above equation from left by ${g}_{k+1}^{\text{T}}$ we get

${g}_{k+1}^{\text{T}}{d}_{k+1}^{RMAR}=-{‖{g}_{k+1}‖}^{2}+\frac{{‖{g}_{k+1}‖}^{2}-\frac{‖{g}_{k+1}‖}{‖{d}_{k}‖}{g}_{k+1}^{\text{T}}{d}_{k}}{{‖{d}_{k}‖}^{2}}{g}_{k+1}^{\text{T}}{d}_{k}$.

In , they proved that

$0\le \frac{{‖{g}_{k+1}‖}^{2}-\frac{‖{g}_{k+1}‖}{‖{d}_{k}‖}{g}_{k+1}^{\text{T}}{d}_{k}}{{‖{d}_{k}‖}^{2}}\le 2\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}$ (19)

Used (17), and (19) the direction become

$\begin{array}{c}{g}_{k+1}^{\text{T}}{d}_{k+1}\le -{‖{g}_{k+1}‖}^{2}+2{\alpha }_{k}L{‖{g}_{k+1}‖}^{2}\\ =-\left(1-2{\alpha }_{k}L\right)\cdot {‖{g}_{k+1}‖}^{2}\\ =-{E}_{2}\cdot {‖{g}_{k+1}‖}^{2}\end{array}$ (20)

where ${E}_{2}=\left(1-2{\alpha }_{k}L\right)>0$ with $0<2{\alpha }_{k}L<1$ and $0.

So, it is proved that ${d}_{k+1}^{RMAR}$ satisfied the sufficient descent condition.

Now, we are going to prove the direction satisfy the sufficient descent condition when $0<{\phi }_{k}<1$, firstly for

$\begin{array}{l}\left(1-{\phi }_{k}\right){\beta }_{K}^{MMWU}{g}_{k+1}^{\text{T}}{d}_{k}\\ =\frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{g}_{k}^{\text{T}}{d}_{k}-\left[\frac{{s}_{k}^{\text{T}}{g}_{k+1}{‖{d}_{k}‖}^{3}-{y}_{k}^{\text{T}}{g}_{k+1}{‖{d}_{k}‖}^{3}+{‖{g}_{k+1}‖}^{2}‖{d}_{k}‖{y}_{k}^{\text{T}}{d}_{k}}{‖{g}_{k+1}‖\left({g}_{k+1}^{\text{T}}{d}_{k}\right){y}_{k}^{\text{T}}{d}_{k}}\right]\ast \frac{{‖{g}_{k+1}‖}^{2}}{{‖{d}_{k}‖}^{2}}{g}_{k+1}^{\text{T}}{d}_{k}\end{array}$

We have from Lipschitz condition ${g}_{k+1}^{\text{T}}{d}_{k}<{y}_{k}^{\text{T}}{d}_{k}$ and

$-\left(1-\sigma \right)‖{g}_{k}‖\le {y}_{k}^{\text{T}}{d}_{k}\le {\alpha }_{k}L{‖{d}_{k}‖}^{2}$

with a mathematical calculation, we get

$\begin{array}{l}\left(1-{\phi }_{k}\right){\beta }_{k}^{MMWU}{g}_{k+1}^{\text{T}}{d}_{k}\\ \le \left[\frac{{\alpha }_{k}L{‖{d}_{k}‖}^{2}}{{‖{d}_{k}‖}^{2}}-\frac{‖{s}_{k}‖‖{g}_{k+1}‖‖{d}_{k}‖-{\alpha }_{k}L{‖{d}_{k}‖}^{2}+{‖{g}_{k+1}‖}^{2}{\alpha }_{k}L‖{d}_{k}‖}{‖{g}_{k+1}‖\cdot \left(-\left(1-\sigma \right)‖{g}_{k}‖\right)}\right]{‖{g}_{k+1}‖}^{2}\\ \le \left[{\alpha }_{k}L+\frac{L}{\left(1-\sigma \right)}\frac{{‖{s}_{k}‖}^{2}‖{d}_{k}‖-{\alpha }_{k}L{‖{d}_{k}‖}^{3}+{‖{g}_{k+1}‖}^{2}‖{d}_{k}‖}{‖{g}_{k+1}‖{‖{g}_{k}‖}^{2}}\right]{‖{g}_{k+1}‖}^{2}\\ \le \left[{\alpha }_{k}L+\frac{A\gamma B-{\alpha }_{k}L{B}^{2}+{Y}^{2}{\alpha }_{k}LB}{\left(1-\sigma \right)\stackrel{¯}{Y}{\stackrel{¯}{W}}^{2}}\right]{‖{g}_{k+1}‖}^{2}\end{array}$

Let ${E}_{1}={\alpha }_{k}L+\frac{LAB-{\alpha }_{k}L{B}^{3}+{\gamma }^{2}{\alpha }_{k}LB}{\left(1-\sigma \right)\stackrel{¯}{\gamma }\text{ }{\stackrel{¯}{\omega }}^{2}}$

$\therefore \text{\hspace{0.17em}}\left(1-{\phi }_{k}\right){\beta }^{MMWU}{g}_{k+1}^{\text{T}}{d}_{k}\le {E}_{3}{‖{g}_{k+1}‖}^{2}$ (21)

Now, secondly for

$\begin{array}{l}{\phi }_{k}{\beta }_{k}^{RMAR}{g}_{k+1}^{\text{T}}{d}_{k}\\ =\left[\frac{{s}_{k}^{\text{T}}{g}_{k+1}{‖{d}_{k}‖}^{3}-{y}_{k}^{\text{T}}{g}_{k+1}{‖{d}_{k}‖}^{3}+{‖{g}_{k+1}‖}^{2}‖{d}_{k}‖{y}_{k}^{\text{T}}‖{d}_{k}‖}{‖{g}_{k+1}‖\left({g}_{k+1}^{\text{T}}{d}_{k}\right)\left({y}_{k}^{\text{T}}{d}_{k}\right)}\right]\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }\cdot \left[\frac{{‖{g}_{k+1}‖}^{2}-\frac{‖{g}_{k+1}‖}{‖{d}_{k}‖}{g}_{k+1}^{\text{T}}{d}_{k}}{{‖{d}_{k}‖}^{2}}\right]{g}_{k+1}^{\text{T}}{d}_{k}\end{array}$

From (19), Lipschitz condition ${s}_{k}^{\text{T}}{g}_{k+1}\le {y}_{k}^{\text{T}}{s}_{k}\le L{‖{s}_{k}‖}^{2}$ and ${s}_{k}={\alpha }_{k}{d}_{k}$, we get

${\phi }_{k}{\beta }_{k}^{RMAR}{g}_{k+1}^{\text{T}}{d}_{k}=2\left[\frac{L{‖{s}_{k}‖}^{2}‖{d}_{k}‖-‖{y}_{k}‖‖{g}_{k+1}‖{‖{d}_{k}‖}^{3}+{\alpha }_{k}L{‖{g}_{k+1}‖}^{2}{‖{d}_{k}‖}^{2}}{{‖{g}_{k+1}‖}^{2}\left(-\left(1-\sigma \right)\right){‖{g}_{k}‖}^{2}{‖{d}_{k}‖}^{2}}\right]\cdot {‖{g}_{k+1}‖}^{2}$

Since $‖{y}_{k}‖\le ‖{g}_{k+1}‖+‖{g}_{k}‖$, so

$\begin{array}{l}{\phi }_{k}{\beta }_{k}^{RMAR}{g}_{k+1}^{\text{T}}{d}_{k}\\ \le \frac{-2}{1-\sigma }\left[\frac{L{‖{s}_{k}‖}^{2}‖{d}_{k}‖-0.8{‖{g}_{k+1}‖}^{2}‖{d}_{k}‖+{\alpha }_{k}L{‖{g}_{k+1}‖}^{2}‖{d}_{k}‖}{{‖{g}_{k+1}‖}^{2}{‖{g}_{k}‖}^{2}}\right]\cdot {‖{g}_{k+1}‖}^{2}\\ \le \frac{-2B}{1-\sigma }\left[\frac{LA-0.8{\gamma }^{2}+{\alpha }_{k}L{\omega }^{2}}{\stackrel{¯}{\gamma }\text{ }{\stackrel{¯}{\omega }}^{2}}\right]\cdot {‖{g}_{k+1}‖}^{2}\end{array}$

where ${E}_{4}=\frac{2B}{1-\sigma }\left[\frac{LA-0.8{\gamma }^{2}+{\alpha }_{k}L{\omega }^{2}}{\stackrel{¯}{\gamma }\text{ }{\stackrel{¯}{\omega }}^{2}}\right]$

$\therefore \text{\hspace{0.17em}}{\phi }_{k}{\beta }_{k}^{RMAR}{g}_{k+1}^{\text{T}}{d}_{k}\le -{E}_{4}{‖{g}_{k+1}‖}^{2}$ (22)

From (18), (20), (21) and (22) we get

$\begin{array}{c}{g}_{k+1}^{\text{T}}{d}_{k+1}\le -{‖{g}_{k+1}‖}^{2}+{E}_{3}{‖{g}_{k+1}‖}^{2}-{E}_{4}{‖{g}_{k+1}‖}^{2}\\ =-\left[1-\left({E}_{4}-{E}_{3}\right)\right]{‖{g}_{k+1}‖}^{2}\\ =-E{‖{g}_{k+1}‖}^{2}\end{array}$

with $E=1-\left({E}_{4}-{E}_{3}\right)$ and $0<{E}_{4}-{E}_{3}<1$.

So, it is proved that ${d}_{k+1}$ Satisfied the sufficient descent condition.

4. Converge Analysis

Let Assumption 2.1 and 2.2 hold. In  it is proved that for any conjugate gradient method with strong Wolfe line search conditions, it holds:

4.1. Lemma

Let Assumption 2.1 and 2.2 holds. Consider the method (2) and (5) where the ${d}_{k}$ Is a descent direction and αk is received from the strong wolf line search. If

${\sum }_{k\ge 1}\frac{1}{{‖{d}_{k}‖}^{2}}=\infty$.

Then

${\mathrm{lim}}_{k\to \infty }\mathrm{inf}‖{g}_{k}‖=0$.

4.2. Theorem

Suppose that assumption 2.1 and 2.2 holds. Consider the algorithm HFG were $0\le {\phi }_{k}\le 1$ and ${\alpha }_{k}$ is obtained by the strong Wolfe line search and ${d}_{k+1}$ is the descent direction. Then

${\mathrm{lim}}_{k\to \infty }\mathrm{inf}‖{g}_{k}‖=0$.

Proof. Because the descent condition holds, we have . So using lemma 3.1, it is sufficient to prove that is bounded above. From (10). They proved that in  and . ,

And .

Now for ,

By (4), we have .

Since and ,

with some mathematical calculation, we get    5. Numerical Experiments

In this section we selected some of test functions in Table 3 from CUTE library, along with other large scale optimization problems presented in Andrei   and Bongartz et al. .

All codes are written in double precision FORTRAN Language and compiled Visual F90 (default compiler settings) on a Workstation Intel Pentium 4. The value of is always computed by cubic fitting procedure.

We selected 26 large scale unconstrained optimization problems in the extended

Table 3. It gives the comparison depending in the NOI and NOF between, and the proposed method.

Table 4. The percentage performance of the proposed methods.

Figure 1. The comparison between the three methods.

or generalized form. Each problem was tested three times for a gradually increasing number of variables: N = 1000, 5000 and 10,000, all algorithms implemented the strong Wolfe line search (3) and (4) conditions with and and the same stopping criterion is used.

In some cases, the computation stopped due to the failure of the line search to find the positive step size, and thus it was considered as a failure denoted by (F).

We record the number of iteration calls (NOI), the number of function evaluations calls (NOF), and the dimensions of test problems calls (N), for the purpose of our comparisons.

Table 3 gives the comparison depending in the NOI and NOF between, and the proposed method.

Table 4 gives the percentage performance of the proposed methods against and. We have seen that. Method saves (NOI 0.8%), (NOF 7.6%), and method saves (NOI 28.7%), (NOF 40.0%) compared with method.

While Figure 1 gives the comparison between, and, using a well-known Wood test function.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper. 