Solving Large Scale Unconstrained Minimization Problems by a New Ode Numerical Integration Method

In reference [1], for large scale nonlinear equations   = 0 F X , a new ODE solving method was given. This f X , are all real number. So the new method is very suitable for this structure. For quadratic function the convergence was proved and the spectral radius of iteration matrix was given and compared with traditional method. Examples show for large scale problems (dimension N = 100, 1000, 10000) the new method is very efficient.


Introduction
This work is a continuation of [1].In [1] we solved a general nonlinear equations by a new ODE method.The numerical results were very encouraging.For instance, a very tough problem: Brown's equation with the dimension can be solved easily by the new method.= 100 N In this paper we turn our attention to a special function is said to have a gradient structure.This structure comes from seeking a local minimizer in the optimization area: to seek a point X for all X in some neighbourhood of * X , here = X T is a vector, It is well known that the conditions and For a symmetric matrix, the eigenvalues are all real numbers.If the Hessian is positive definite, that means the eigenvalues of   2  f X are all negative real numbers.That is to say the Jacobian of differential Equation (1) possesses negative real eigenvalues, so the new ODE method in [1] is very suitable for this case.(see the stability region Figure 1 and Figure 2 in [1]).

The Method
In [1] for initial problem: a new ODE integration method is as follows: it is ly , we on t differe o implic minimization problem.This point of view leads to the following topic: What is the good choice for parameters  and to solve the linear equations: In order to simplify the writing symbol, we put e Equ tion (6) now is turned into (7): re A is a symmetric positive definite matrix and b For 2N order vector   is a vector.
, the e method (3) can be expressed by . We have Equality (10) shows is the eigenvector of matrix u A , l   et the correspondin genval es be g ei u is less than unity if and only if , we have Re ve ineq writing the abo uality in the form Thus we complete the proof of the convergence, for any if only < 4 3


.   the spectral radius of matrix M which is defi ned by We take a guess that Though we cannot prove it, we consider this guess to be very reasonable.quation In fact the differential e This means the rate of is not satisfied, we can choose an even smaller  to make sure that   . The problem we are now investigatin xed > g is, for a fi 0

 
In the E are real and  S .quation (12), we c nsider 1  to be a function of  and differentiate it:  is a decreasing function of the  .
We now observe the situation in which 1  varies with  .
   has one and only one root in the in know terval (0 so e Equa e r : ,1).We lv tion ( 14)  (In the numerator we take the sign " ", otherwise  *  will be greater than 1) , which increase with i From the definition of  and (15) We have In io X so we use  1.
From Table 1 we can see that the ratio of NFE for theoretical expected values are basically consistent with the real calculation results and the higher the condition number is, the more efficient for our method EPS.
There is another thing need to mention.For 1 = 10  6  , EPS method taking = 570.0877h , is it possible for so large step size?In [1]  , even thou i gh we take so large step size, h are still located at the stability region.merical Experiment 4. Nu he outline of our algorithm EPS is the same as desations to solve are T cribed in [1].The differential equ Usually we like using (24), especially when

 
2 f  X is a diagonal dominant matrix.In this case take we simply = 0.5


For ODE (1), it is said to have a gradient stru the chain rule, we have ( [3] From ( 25) we see that along any analytic solution o the ODE, the quantity increases.
e take ve or, the numerical solution ean t Different case happens with the present method.Because in our method, w ry large step size, it will produce large local err n X may go far from the analytic solution , especially at the beginning of the calculation.
In some earlier literatures, for example, ]; using [6 as conve t this rule just applies to the test problems whic rgence criteria, bu h the * X w not suitable, so we take as known already.For real problems this criterion is TOL as our stopping criteria.
As we did in [1], we divided the calculation process into three stages and took . For ve the me results: = 0.1174 10 .
For large scale problem ( [8], pp. 1 15) proposed a sub-space trust region method STIR.For exam le 1 the method gave two results, one was STIR with ex Newton steps, another was STIR with inexact Newton steps.The number of iterations were 25, 21 respectively.These re 4p act sults showed that STIR need to solve 25 or 21 large scale linear equations [8].Compare with our method EPS, we just need 228 times gradient function evaluation, and no linear equations were needed to solve.
It is well known that ordinary Newton method is very sensitive for the initial value.
Where N is a multiple of 4 and 5, , 3 The diagonal of Gradient function   positive semidefinite are necessary for * X to be a local minimizer, while the conditions and it is called Hessian matrix (or Hessian, for short).In term of ODE numerical integration

Figure 1 and
Figure 2 give the a b- turns to Ne ethod.For the present i X  to get the convergent result needs 229 Newton iterations.If we further improve the initial value, making   < 0.5 G X  (after 128 function evaluations) only two we are interested in large scale problems.For unconstrained (ChainWood) problem, , with three different start p = N oints 100, 1000, 10000   0 X, we get three different results: