A Line Search Algorithm for Unconstrained Optimization*

It is well known that the line search methods play a very important role for optimization problems. In this paper a new line search method is proposed for solving unconstrained optimization. Under weak conditions, this method possesses global convergence and R-linear convergence for nonconvex function and convex function , respectively. Moreover , the given search direction has sufficiently descent property and belongs to a trust region without carrying out any line search rule. Numerical results show that the new method is effective


Introduction
Consider the unconstrained optimization problem min ( ) where is continuously differentiable.The line search algorithm for (1) often generates a sequence of iterates { } k x by letting 1 , 0,1 ,2, where
One simple line search method is the steepest descent method if we take k k d g   as a search direction at every iteration, which has wide applications in solving large-scale minimization problems [4].However, the steepest descent method often yields zigzag phenomena in solving practical problems, which makes the algorithm converge to an optimal solution very slowly, or even fail to converge [5,6].Then the steepest descent method is not the fastest one among the line search methods. If is the search direction at each iteration in the algorithm, where k H is an n × n matrix approxi- , then the corresponding line search method is called Newton-like method [4][5][6] such as Newton method, quasi-Newton method, variable metric method, etc.Many papers [7][8][9][10] have been proposed by the method for optimization problems.However, one drawback of the Newton-like method is required to store and compute matrix k H at each iteration and thus adds the cost of storage and computation.Accordingly, this method is not suitable to solve large-scale optimization problems in many cases.
The conjugate gradient method is a powerful line search method for solving the large scale optimization problems because of its simplicity and its very low memory requirement.The search direction of the conjugate gradient method often takes the form , 1 , 0 , where k R   is a scalar which determines the different conjugate gradient methods [11][12][13] etc.The convergence behavior of the different conjugate gradient methods with some line search conditions [14] has been widely studied by many authors for many years (see [4,15]).At present, one of the most efficient formula for k  from the computation point of view is the following PRP method   , which implies that the direction k d of the PRP method will turn out to be the steepest descent direction as the restart condition automatically when the next iteration point is approximate to the current point.This case is very important to ensure the efficiency of the PRP conjugate gradient method (see [4,15] etc.).For the convergence of the PRP conjugate gradient method, Polak and Ribière [16] proved that the PRP method with the exact line search is globally convergent when the objective function is convex, and Powell [17] gave a counter example to show that there exist nonconvex functions on which the PRP method does not converge globally even the exact line search is used.
We all know that the following sufficient descent condition , for all 0 k  and some constant is very important to insure the global convergence of the algorithm by nonlinear conjugate gradient method, and it may be crucial for conjugate gradient methods [14].It has been showed that the PRP method with the following strong Wolfe-Powell (SWP) line search rules which is to find the step size k  satisfying 1 ( ) did not ensure the condition (5) at each iteration, where Then Grippo and Lucidi [18] presented a new line search rule which can ensure the sufficient descent condition and established the convergence of the PRP method with their line search technique.
Powell [17] suggested that k  should not be less than zero.Considering this idea, Gilbert and Nocedal [14] proved that the modified PRP method max{0, } ) Over the past few years, much effort has been put to find out new formulas for conjugate methods such that they have not only global convergence property for general functions but also good numerical performance (see [14,15]).Resent years, some good results on the nonlinear conjugate gradient method are given [19][20][21][22][23][24][25].
These observations motivate us to propose a new method which possesses not only the simplicity and low memory but also desirable theoretical features.In this paper, we design a new line search method which possesses not only the sufficiently descent property but also the following property whatever line search rule is used, where the property (9) implies that the search direction k d is in a trust region automatically.
This paper is organized as follows.In the next section, the algorithms and other line search rules are stated.The global convergence and the R-linear convergence of the new method are established in Section 3. Numerical results and one conclusion are presented in Section 4 and in Section 5, respectively.

The Algorithms
Besides the inexact line search techniques WWP and SWP, there exist other line search rules which are often used to analyze the convergence of the line search method: 1) The exact minimization rule.The step size k  is chosen such that 0 ( ) m i n ( ) 2) Goldstein rule.The step size k  is chosen to satisfy (6) and 2 ( ) Now we give our algorithm as follows.1) Algorithm 1 (New Algorithm) Step 0: Choose an initial point 0 , then stop; Otherwise go to step 2.
Step 2: Compute steplength k  by one line search technique, and let then stop; Otherwise go to step 4.
Step 4: Calculate the search direction where k  is defined by (4).
Step 5: Let Step 6: Let   , and go to step 2.
Remark.In the Step 5 of Algorithm 1, we have which can increase the convergent speed of the algorithm from the computation point of view.
Here we give the normal PRP conjugate gradient algorithm and one modified PRP conjugate gradient algorithm [14] as follows.
Step 2: Compute steplength k  by one line search technique, and let then stop; Otherwise go to step 4.
Step 4: Calculate the search direction   and go to step 2. We will concentrate on the convergent results of Algorithm 1 in the following section.

Convergence Analysis
The following assumptions are often needed to analyze the convergence of the line search method (see [15,26]).
Assumption A (i) f is bounded below on the level set 0 Now we consider the vector product  in the following two cases: case 1.If   .Then we get   .Then we obtain  and use the Step 6 of Algorithm 1, ( 5) and ( 9) hold, respectively.The proof is completed.
The above lemma shows that the search direction k d has such that the sufficient descent condition (5) and the condition (9) without any line search rule.
Based on Lemma 3.1, Assumption (i) and (ii), let us give the global convergence theorem of Algorithm 1.
Theorem 3.1 Let be generated by Algorithm 1 with the exact minimization rule, the Goldstein line search rule, the SWP line search rule, or the WWP line search rule, and Assumption (i) and (ii) hold.Then holds.
Proof.We will prove the result of this theorem with the exact minimization rule, the Goldstein line search rule, the SWP line search rule, and the WWP line search rule, respectively.
1) For the exact minimization rule.Let the step size k  be the solution of (10).By the mean value theorem, 0 and Assumption (ii), for any which together with Assumption (i), we can obtain This implies that holds.By Lemma 3.1, we get (12).
2) For Goldstein rule.Let the step size k  be the solution of ( 6) and (11).By (11) and the mean value theorem, we have Using Assumption (ii) again, we get which combining with (6), and use Assumption (i), we have ( 14) and (15), respectively.By Lemma 3.1, (12) holds.
3) For strong Wolf-Powell rule.Let the step size k  be the solution of ( 6) and (7).By (7), we have ( ) Similar to the proof of the above case.We can obtain (12) immediately.
4) For weak Wolf-Powell rule.Let the step size k  be the solution of ( 6) and (8).Similar to the proof of the case 3), we can also get (12).Then we conclude this result of this theorem.By Lemma 3.1, there exists a constant 0 0 By the proof process of Lemma 3.1.We can deduce that there exists a positive number 1 Similar to the proof of Theorem 4.1 in [27], it is not difficult to prove the linear convergence rate of Algorithm 1.We state the theorem as follows but omit the proof.
Theorem 3.2 (see [27]) Based on (16), (17), and the condition that the function f is twice continuously differentiable and uniformly convex on n R .Let be generated by Algorithm 1 with the exact minimization rule, the Goldstein line search rule, the SWP line search rule, or the WWP line search rule.Then { } k x converges to x  at least linearly, where x  is the unique minimal point of ( ) f x .

Numerical Results
In this section, we report some numerical experiments with Algorithm 1, Algorithm 2, and Algorithm 3. We test these algorithms on some problem [28] taken from MATLAB with given initial points.The parameters common to these methods were set identically, 1 0.1 In this experiment, the following Himmeblau stop rule is used: , where According to the above rules, we know that one solver whose performance profile plot is on top right will win over the rest of the solvers.
In Figures 1-3, NA denotes Algorithm 1, PRP denotes Algorithm 2, and PRP + denotes Algorithm 3. Figures 1-3 show that the performance of these methods is relative to NT NF m NG    , where NF and NG denote the number of function evaluations and gradient evaluations respectively, and m is an integer.According to the results on automatic differentiation [30], the value of m can be set to 5 m  .That is to say, one gradient evaluation is equivalent to m number of function evaluations if automatic differentiation is used.From these three figures it is clear that the given method has the most wins (has the highest probability of being the optimal solver).
In summary, the presented numerical results reveal that the new method, compared with the normal PRP method and the modified PRP method [14], has potential advantages.

Conclusions
This paper gives a new line search method for unconstrained optimization.The global and R-linear convergence are established under weaker assumptions on the search direction k d .Especially, the direction k d satisfies the sufficient condition (5) and the condition (9) without carrying out any line search technique, and some paper [14,27,30] often obtains these two conditions by assumption.The comparison of the numerical results shows that the new search direction of the new algorithm is a good search direction at every iteration.
k x is the current iterate point, k d is a search direction, and 0 k   is a steplength.Different choices of k d and k  will determine different line search methods [1-3].The method is divided into two stages at each iteration: 1) choose a descent search direction k d ; 2) choose a step size k  along the search direction k d .Throughout this paper, we denote ( ) globally convergent under the sufficient descent assumption condition and the following weak Wolfe-Powell (WWP) line search technique: find the steplength k  such that (6) and 2 ( the program if the iteration number is more than one thousand.Since the line search cannot always ensure the descent condition 0 T k k d g  , uphill search direction may occur in the numerical experiments.In this case, the line search rule maybe failed.In order to avoid this case, the stepsize k  will be accepted if the Thus ( ) s t  was the probability for solver s S  that a performance ratio , p s r was within a factor t R  of the best possible ration.Then function s  was the (cumulative) distribution function for the performance ratio.The performance profile was a nondecreasing, piecewise constant function, continuous from the right at each breakpoint.The value of (1) s  was the probability that the solver would win over the rest of the solvers.