PRP-Type Direct Search Methods for Unconstrained Optimization *

Three PRP-type direct search methods for unconstrained optimization are presented. The methods adopt three kinds of recently developed descent conjugate gradient methods and the idea of frame-based direct search method. Global convergence is shown for continuously differentiable functions. Data profile and performance profile are adopted to analyze the numerical experiments and the results show that the proposed methods are effective.


Introduction
Direct search methods form an important class of numerical methods for solving optimization problems.They are particularly useful in the solution of the problems where the derivatives are not available, or difficult to compute.Early direct search methods can trace back to the compass search method [1] and the pattern search method [2].In the 1960s, the direct search methods were widely applied.Due to the lack of convergence theory, direct search methods fell out of favor with the mathematical optimization community by the early of 1970s.
In the past twenty years, direct search methods have seen a revival.Significant interests have been provoked by new convergence results.For a direct search method to achieve convergence, it often needs to employ one of the following three techniques, namely line searches, trust regions and discrete grids [3].In this paper we will adopt the discrete grid strategy to develop some globally convergent direct search methods.Alternative approaches based on line searches and trust regions can be found in [3][4][5][6] and the references therein.
Early convergence theory based on discrete grids was established by .They developed a framework for the GPS (generalized pattern search) method which contains many earlier algorithms, such as the compass search, Hooke and Jeeves' pattern search, as special cases.The GPS methods have been extended to bound and linearly constrained problems [10,11], nonsmooth problems [12] and general constrained problems [13].
The grids play an important role in the GPS methods and their extensions.In the GPS methods, each grid is often a subset of some member of a sequence of nested grids.
Coope and Price [14] has shown that the grids can be chosen more freely.The grid-based methods proposed in [14] incorporated two arbitrary finite processes which may reoriented and reshaped the grids.Partly provoked by Coope and Price's ideas, the review paper [15] presented a new unified presentation which was called GSS (generating set search) method for a large number of direct search methods including the GPS methods.The GSS methods have been also extended to linear constrained optimization [16,17].
The great freedom in the choice of grids permits to choose grids to reflect the information gathered during previous iterations, especially the possible gradient information.In [18], a concept called frame was proposed to gather the gradient information.Here a frame (see Definition 2.2) is a fragment of a grid.Loosely speaking, a frame is a set of points which surround a central point called frame center.Suppose that we already have a frame around the iterate k x , then the set of search directions must contain all the frame points and may contain an arbitrary infinite process.If there is no search direc-tion such that the objective function value is sufficiently less than x is called a quasi-minimal point (see Definition 2.3).After a quasi-minimal point is founded, the frame size must be reduced for convergence.
A direct search method which confirms to the framebased template in [18] was proposed in [19], where the PRP+ method was used to make use of the gradient information gathered during previous iterations.The presented numerical experiments in [19] show that this frame-based conjugate gradients method is effective.
In this paper, we consider the following unconstrained optimization where the objective function   f x is continuously differential but the gradient of f is not available or computationally expensive.
Our main purpose is to construct efficient direct search methods for solving (1).We also use the frame-based template in [18], but substitute PRP+ method with some descent conjugate gradient methods to make use of the gradient information.These descent conjugate gradient methods for solving unconstrained optimization problems enjoy some nice properties and good numerical behavior.A common property of these methods is that they can generate sufficient descent directions for the objective function.In the case when exact gradients are available, descent conjugate gradient methods are computational more effective than PRP+ method [20][21][22][23].We are going to include the descent conjugate gradient methods in the frame-based direct search framework to develop more efficient direct search methods for solving (1).
We will consider descent conjugate gradient methods which we call the TTPRP (three-term PRP) method proposed in [23], the TMPRP (two-term modified PRP) method proposed in [4] and the PRP-DC proposed in [24].
Let us have a simple review to these descent conjugate gradient methods.Each method generate a sequences of where the step-length k  is determined by some line search, and the initial direction is set to where n the TMPRP method, the direction where I denotes the identity matrix.In the PRP-DC method, the direction where As the computation of the gradient f  is not available in the direct search methods, we need to find some estimation of the gradient f  instead of f  in ( 2), ( 3) and ( 4).Based on these conjugate gradient directions, we then design direct search methods.Under mild conditions, we will prove that these methods are globally convergent.Our numerical results show that the proposed methods are efficient.
In the next section, we describe our direct search algorithm framework.In Section 3 we establish a global convergence theorem for this algorithm framework.In Section 4, we compare the performance of the proposed methods with some existing methods.

The Algorithm
In this section, we first introduce some concepts and then describe our algorithms.
Positive basis was first proposed in [25], and has become an important concept in direct search methods.In , , , , R .re are many other important positive bases, see [25] for more details.
The following two definitions define the frame and quasi-minimal frame proposed by Coope and Price in [18] and [19] respectively.Definition 2.2.A frame round x with size > 0 is defined by where V  is a positive basis.
where , and then the frame is not quasi-minimal, and we say that the function value at is sufficiently less than that of the frame center or that at the frame point the objective function obtain sufficient decrease.In a frame which is not a quasi-minimal frame, there is at least a frame point at which the objective function can obtain sufficient decrease.

y y
In this paper, we use the well-known positive basis to define a quasi-minimal frame, where is the unit vector.Firstly, a frame is constructed centered at the current iterate k x .Using the function values at the frame points, we estimate k g as an approximation to the gradient The next search direction is then calculated according to (2) for TTPRP (or (3) for TMPRP or (4) for PRP-DC) in which is replaced by k g .We use the line search method proposed in [19] to get a step-length k  .Specifically, we solve the following one dimension optimization problem to get k : where the current iterate k x , the frame size k and the conjugate direction are given.The next iterate is then set to be If the current frame is quasi-minimal, then decrease the frame size by letting , then increase the frame size by letting . As the search direction may not be a descent direction, the step-length k  determined by (7) may be negative.
In our method, we adopt the n-step restart strategy.Because the n-step restart conjugate gradient method possesses n-step quadratic convergence property when applied to minimizing a twice continuously differentiable function.Specifically, at iterate , we use as the search direction.For the purpose of convergence, we need to set x  be determined by the following rule.(8) We summarize the above process as the algorithm below.
; else set .Increase k by one and decrease by one respectively.
x  equal to the lowest known point according to (8);  set 1 j n   , increase k by one.

end.
In the above algorithm, the variable counts the number of iterates until next restart.

Convergence Analysis
If we regard all the conjugate gradient steps as part of the finite process, then the general convergence result of the frame-based method will still hold [18].In this section, we describe the convergence result specified for our algorithm.
Suppose the sequence of quasi-minimal points generated by the algorithm is denoted by 2) is continuously differentiable; and z m is an infinite sequence whose cluster points are stationary points of .f Proof.Firstly, we prove that the algorithm generates an infinite sequence of quasi-minimal points.Suppose on the contrary that the sequence of quasi-minimal points is finite.Let the final quasi-minimal point be m .After the final quasi-minimal iterate is found, the parameter Copyright © 2011 SciRes.AM These two inequalities yield . □   0 f z    h will never decrease, and (6) will hold for all .This implies the sequence of the function values is unbounded.It contradicts condition 1).Consequently, the sequence of quasi-minimal iterates

Numerical Experiments
Let be an arbitrary cluster point of and the subsequence . Then for any In this section, we report some numerical results.We compare the proposed direct search methods (denoted as TTPRP, TMPRP and PRP-DC respectively) with the preconditioned PRP+ direct search method (denoted as PRP+(pre)) proposed in [19].
where is the frame size corresponding to the quasi-minimal point .So we get We adopt both the performance profile [26] and the data profile [27] to compare the performance among different methods.
The performance profile seeks to capture how well one solver performs relative to the other selected solvers on the set of test problems.While the data profile display the raw data.In particular, the data profile seeks to tell the percentage of the problems that can be solved (for a given tolerance  ) with any given number of function evaluations.So the data profile is especially suit for the situation where the function evaluation is expensive [27].
We use the 53 smooth test problems proposed in [27].The maximum dimension of these problems is 12.All methods halt when the following convergence test proposed in [27] was satisfied where 0   is a tolerance, 0 x is the starting point for the test problem, and L f is computed for each solver as the smallest value of f obtained by any solver within 1300 function evaluations.Because we are interested in the short-term behavior of these methods as the accuracy level changes, we present the data profiles and the performance profiles for with {1, 3, 5, 7}. .The role of min is to ensure that would not be too close to the machine precision.If gets too close to the machine precision then the gradient estimates may become completely inaccurate.

h h h
The codes were written in Fortran 90, and the program was run on a PC with a Genuine Intel (R) CPU (T1350@ 1.86 GHz) and 504 M memory.
Figure 1 shows the data profiles for tolerance  = 10 −1 , 10 −3 , 10 −5 and 10 −7 respectively.It can be seen that TTPRP solves the largest percentage of problems for almost all size of computational budget and levels of accuracy  .TMPRP and PRP-DC are also comparable with PRP + (pre).It is noteworthy that the performance differences between TTPRP and other solvers tend to increase as the tolerance  decreases and the computa- tional budget  increases.On the other hand, Figure 1 also shows that within about 50 simplex gradients the performance differences between four solvers are small.
Figure 2 shows the performance profiles based on the number of function evaluations.The left side of each plot gives the percentage of the test problems which a solver can solve with the greatest efficiency; the right side gives the percentage of the test problems which a solver can solve successfully.Mainly, the right side measures the robustness of a solver.
It can be seen from Figure 2 that TTPRP solves the largest percentage of problems for almost all size of per- formance ratio and levels of accuracy  .TMPRP and PRP-DC are also comparable with PRP + (pre).It is noteworthy that the performance differences between TTPRP and other solvers still tend to increase as the tolerance  decreases.

Discussion
Based on the Coope-Price direct search framework, we proposed three PRP-type direct search methods.All of them employ a kind of descent conjugate gradient direction.When exact gradients are available, descent conjugate gradient methods can generate sufficient descent directions for the objective function, and numerical results have shown that they are often computational more effective than PRP+ method.
In this paper, the gradient information of the objective function is not available.We estimate the gradients based on the function values obtained at the maximal positive basis.These gradients estimated may be not accurate very much, but global convergence can be ensured under the Coope-Price direct search framework.In other words, convergence is guaranteed by the frame-based nature of the algorithms, not the fact that they mimic a conjugate gradients method.
The accuracy of the gradients estimated may also affect the descent property of the descent conjugate gradient directions.However, the numerical results show that the proposed PRP-type direct search methods are promising and competitive, especially the TTPRP method.

i
We now describe our PRP-type direct search algorithm.

1
knx  equal to the lowest known point before the next reset.That is to say, we let 1 kn

k 2 )
While (stopping conditions do not hold) do a) Construct a frame centered at the current iterate k x .Calculate the function values at the frame points, and form the gradient estimation k g .b) Check stopping conditions.c) Calculate the new search direction .k d) Execute the line search process to find j

Figure 1 .
Figure 1.Data profiles show the percentage of problems solved as a function of a computational budget of simplex gradients.
In our numerical experiments, if the current frame is quasi-minimal, then the frame size is reduced by the formula