Higher Order Iteration Schemes for Unconstrained Optimization

Using a predictor-corrector tactic, this paper derives new iteration schemes for unconstrained optimization. It yields a point (predictor) by some line search from the current point; then with the two points it constructs a quadratic interpolation curve to approximate some ODE trajectory; it finally determines a new point (corrector) by searching along the quadratic curve. In particular, this paper gives a global convergence analysis for schemes associated with the quasi-Newton updates. In our computational experiments, the new schemes using DFP and BFGS updates outperformed their conventional counterparts on a set of standard test problems.


Introduction
Consider the unconstrained optimization problem where : n f R  R is twice continuously differentiable.Let k x be the k-th iteration point.We will denote values of ( )  f x and its gradient at k x by k f and k f  , respectively.
Optimization problems are usually solved by iteration methods.The line search widely used in unconstrained optimization is a kind of iteration scheme for updating iterates.Such a scheme, by which one obtains the next iterate 1 k x  from a current iterate k x , is of the following form: where k and p  are termed search direction and stepsize, respectively.k is usually determined as a descent direction with respect to the objective p ( ) f x , and  by exact or inexact line searches, so that the objective value decreases after the iteration.For instance, the famous Newton method uses the scheme with search direction   is the Hessian matrix of ( ) f x at k x , and stepsize = 1  .The quasi-Newton methods are reliable and efficient in solving the unconstrained optimization problems.Saving explicit calculations of the second order derivatives and solution of a system of linear equations, quasi-Newton methods achieved a great degree of popularity since the first paper of Davidon [1,2].He used = , where k H is some approximation to the inverse Hessian matrix   where Various quasi-Newton updates were proposed in the past.The important modification of Davidon's work by Fletcher and Powell [3] (the DFP algorithm) was the first and successful one.It was then surpassed by the BFGS update (as accepted as the best quasi-Newton method) [4][5][6][7][8] proposed independently by Broyden, Fletcher, Goldfarb and Shanno.These updates theoretically guarantee all k H to be positive definite; therefore, the asso-ciated k is a descent direction, and the objective decreases if p  is determined by some line search.
There are other iteration schemes that appear differently from the conventional ones.The so-called ODE methods use the following initial value problem: Assume that satisfies certain conditions, and hence the preceding defines a trajectory.

and
. The associated trajectories might be called steepest descent curve and Newton curve respectively [10].In this way, in fact, one could obtain many curves corresponding to existing unconstrained optimization methods.
  Pan [11][12][13] generalized the steepest descent curve and Newton curve by setting ( ) = ( ) ( ) , where ( ) x  is called ratio factor and ( ) A x direction matrix.He suggested some concrete ratio factors and direction matrices, and showed that under certain conditions, the objective value decreases strictly along the associated trajectory, the limit point of which is just an optimum.
ODE methods treat the optimization problem in the view of trajectory.They use numerical methods to approximately calculate associated trajectory, and finally approach the limit point of the trajectory.When Euler's approach is applied in the ODE method, standard iteration schemes are obtained.In fact the standard iteration schemes are originally derived in the direction of decreasing the objective function value instead of trajectory.Euler's approach is only of the first order precision.So it is possible to apply higher order approach to mimic the trajectory to get higher order iteration scheme than the standard one.
In this paper, we derive new iteration schemes along this line.In view of the importance of DFP and BFGS methods, we will focus on iteration schemes with respect to these methods.
The paper is organized as follows.Section 2 derives new iteration schemes.Section 3 offers the convergence analysis.Section 4 reports encouraging computational results with a set of problems.

Higher Order Iteration Scheme
Assume that k x is the current iterate.The next iterate 1 k x  will be determined by approximately following the trajectory, defined by (4).Let We construct a quadratic interpolation curve, locally approximating the trajectory as follows: where satisfy the following conditions: , , Pre-multiplying the both sides of (7a) by , we obtain an approximate 1 k , furthermore, have an approximate solution of (6a)-(6d).
  where   The unconstrained optimization problem (1) can get a approximate solution by solving the following one-dimension minimization problem: To solve such problem, we apply the inexact line search rule, furthermore, we modify the sufficient decent condition in this way: where (0,1)   .

Modified Inexact Line Search Algorithm
The conventional backtracking inexact line search [14] operates in this way.At the beginning we set .The algorithm will stop if t satisfies the sufficient decent condition.Otherwise, the algorithm will continue with A modified backtracking inexact line search algorithm was obtained by applying the expression (12) as the sufficient decent condition in backtracking line search algorithm.

Higher Order Iteration Schemes
The higher order iteration schemes, firstly obtain the predictor x  satisfying modified inexact line search rule (12).The overall steps of the higher order iteration schemes are organized as follows.
Step 5. Set and go to step 1.

An Extension of the Higher Order Iteration Scheme
The higher order iteration schemes vary with different .In this paper we extend the higher order iteration schemes to BFGS method, and set k .We get the predictor Then by searching along the curve, we obtain the new point x  .The overall steps of the variant of the itera-tion schemes are organized as follows.
Step 3. Compute where , , a b c by ( 8) and (9). where Step 7. Set := 1 k k  and go to step 1. Note: 1) In this paper, I denotes identity matrix.2) In step 3, we adopt a strategy to make sure that the curvature condition > .Clearly, the BFGS method is a kind of conjugate direction method, so the restart technique can reduce the accumulation of the roundoff errors.
5) We only report the variant using BFGS update.we also derive a variant of the higher order iteration schemes by using the DFP updating formula instead of the BFGS updating formula in step 3 and step 6.

The Global Convergence of the Higher
Order Iteration Schemes Definition 3.1 [11] Curve ( ) x t is contained in the domain of and is strictly monotone decreasing in x t is a decent trajectory of at ( ) x .And if lim ( ) is exist and equal the minimization point of , then the curve Pan proved the global convergence of the ODE methods with ratio factor and direction matrix [11].In this paper we only consider the situation that ratio factor is 1 and direction matrix is identity matrix.So we draw the theorem as follows, Theorem 3.1 [11] is bounded close set, and is twice continuously differentiable in the set  and 0 , then the right segment trajectory of the the ordinary equations ( 4) is the decent curve of at 0 x and the limit point of the trajectory is the stationery point of ( ) f x .If is convex, then the right segment trajectory is normal decent curve of at We use the quadratic interpolation curve (5) to approximate the trajectory.However, when , the iteration scheme may not be decent in the local region of the predictor So we apply the strategy of step 1 in subalgorithm (2.1) to keep the iteration decent.

Theorem 3.2 Given constant and
where k and k are decent direction, and k satisfying the condition and where 1 and 2 are constants.Suppose that > 0 c > 0 c ( ) f x is bounded below and continuously differentiable in the level set where 0 x is the starting point.And the gradient f  is Lipschitz continuous on ; namely, there exists L such that Then for some , Proof.Consider the situation that for all k, Then from the algorithm (2.1), we have that and By ( 21) and ( 22), we have and With ( 23) and (24) , we obtain By summing this expression over all indices less than or equal to k, we obtain Since f is bounded below, we have that 0 is less than some positive constant, for all k.Hence, by taking limits in (26), we obtain In standard inexact line search algorithem, we know that if the initiate trial does not satisfy the condition t (11), then k t  violate the condition.So > .
By the Lipschitz condition (37), we have It follows from ( 28) and (29) that If initiate trial satisfy the condition (11), then .Furthermore, from ( 16) and (17), we have It follows from ( 16) and ( 17) By (33), we obtain that This implies that = 0. lim The theorem (3.3) analyzes the global convergence of the iteration scheme based on ODE, similarly, we obtain the global convergence of the variant iteration scheme using BFGS update formula.

Theorem 3.4 Consider the algorithm (2.2), suppose that f is bounded below in and continuously differentiable in the level set
where 0 x is the starting point.And the gradient f  is Lipschitz continuous on ; namely, there exists L such that Then for some , Proof.Consider the situation that for all k, Then from the algorithm (2.2), we have that The step 3 implies that By combining the condition (39), ( 40) and ( 41), we have From the step 8 in the algorithm (2.2), we have that and From the theorem (3.3), we conclude that = 0. lim is hold.□

Computational Results
In this section, we report computational results showing that the variant iteration schemes using BFGS and DFP update formula outperformed the BFGS method and DFP method on two sets of test functions.The first set of 20 functions were from [15], and the second from [16], which can be obtained from http://www.ici.ro/camo/neculai/ansoft.htm.

Test Codes 4.2. Result for 20 Small Scale Functions
In this section, the following four codes are tested: The first set of test problems included the 20 problems.Numerical results obtained are listed in Table 1, where numbers of function value computation and gradient computation are listed in columns labeled "f " and " f  ", respectively.And CUP-time required for solving each problem are listed in columns labeled 'Time' and its unit is second."-" denotes that the algorithm does not get a correct solution in upper bound iteration number.
 HDFP: the higher order iteration schemes using DFP update. HBFGS: the higher order iteration schemes using BFGS update.To have the competitions fair and easy, all the codes were implemented with the same parameters:  Compiled using Matlab 7.0.4,the four codes operated under a Windows XP system Home Edition Version 2002 on an Asus PC with Genuine Intel(R) Centrino-Duo T2300 processor 1.66 GHz and 1.00 GB memory.
Table 1 serves as an comparison between the BFGS and HBFGS.It shows that the computation numbers of function value andgradient vectors of HBFGS are fewer than that of BFGS.However, the HBFGS costs 0.11 seconds more than the BFGS, because the HBFGS has to compute k .Although the computation of k is much less compare with that of function value, it affects the CPU-time, especially, for small scale problems.So the HBFGS is competitive with BFGS on the 20 small scale

Result for 50 Middle Scale Functions
The second test set of 50 problems consist of 43 functions with 100 variables, 3 functions with 200 variables and 4 functions with 300 variables.The problems with "*" have 300 independent variables, and with "**" have 200 independent variables.Table 2 shows that compared with BFGS, the computation of the function value and the gradient vectors and CPU-time of HBFGS decrease by 52.65%, 52.08% and 36.01%,respectively.In summary the HBFGS method are faster and have less computation than the BFGS method.

Result for 50 Large Scale Functions
The second test set of 50 problems consist of 46 functions with 500 variables, 4 functions with 300 variables.The problems with "*" have 300 independent variables.
Table 3 shows that the HBFGS's CPU-time, computation numbers of function value and gradient vectors are less than the BFGS by 949.10 seconds, 38808 and 3957, respectively.

Statistics of the Ratio
The Table 4 gives overall comparison of HDFP, HBFGS and DFP, BFGS.In Table 4, "Time" denotes the run time ratio, "f " denotes function value computation number ratio and " f  " denotes gradient computation number ratio.Table 4 shows that the HDFP outperforms the DFP with the average CPU-time ratio 1.58, function computation ratio 1.67 and gradient computation ratio 1.71.And the HBFGS defeats the BFGS with the average CPUtime ratio 1.23, function computation ratio 1.57 and gradient computation ratio 1.56.

Summary of the Tests
As the tests show, although the higher order iteration schemes add the computation of k , it has less computation of function value and gradient vector.For large scale problems, the computation of is much less than that of function value.

Concluding Remarks
We gave a new iteration scheme based on ODE, proved  the global convergence of this scheme and variant method using DFP and BFGS update formula.In particular, this iteration has a class of variant methods using different directions as the right-hand side vectors of (4).From our experiments, we can safely conclude that this iteration scheme improved the BFGS and DFP method on the test data sets.

H
by rank-one or rank-two matrix.To this end, all quasi-Newton updates require 1 k H  for satisfying the so-called quasi-Newton equa- point k x by inexact line search rule following the direction k .Then construct the quadratic interpolation curve by the relevant infor-

3 ). 4 )
In step 4, we call the subalgorithm (2.1), in which we set the parameter 1 = c  In step 6, we adopt a restart technique that if 1 k  is the integral multiple of the N or T results of the BFGS and HBFGS are showed below.And the performance of DFP and HDFP are only demonstrated in the overall results table.