On Cost Based Algorithm Selection for Problem Solving

This work proposes a novel framework that enables one to compare distinct iterative procedures with known rates of convergence, in terms of the computational effort to be employed to reach some prescribed vicinity of the optimal solution to a given problem of interest. An algorithm is introduced that decides between two competing algorithms, which algorithm makes the best use of the computational resources for some prescribed error. Several examples are presented that illustrate the trade-offs involved in such a choice and demonstrate that choosing an algorithm over another with a higher rate of convergence can be perfectly justifiable in terms of the overall computational effort.


Introduction
It is not automatic that the runner with the longest stride will win a marathon, although it can be ascertained that she will complete the distance with the fewest number of strides.The same analysis can be applied to numerical algorithms: it is certain that the algorithm with the faster convergence rate will take fewer steps to converge, but it does not necessarily follow that its convergence is the fastest.This happens because convergence rate is often defined in terms of the iteration counter.Hence, an analysis based solely on such a rate is equivalent to picking as the marathon winner the runner with the longest stride.
This paper is concerned with proposing new measures for algorithm efficiency based on the computational effort.Such a measure can be more appropriate than the usual convergence rate with respect to the iteration counter.It permits us to assess how efficiently an algorithm employs the computational resources at hand, while searching for a solution to a given problem.That, in turn, allows one to compare algorithms not in terms of how many steps they employ to reach a solution, but based on how much computational effort (time) they apply to reach the solution.Alternatively, one can compare algorithms based on their usage of limited available computational effort, i.e. given a fixed amount of computational effort, one would wish to identify which algorithms get closer to the solution while applying at most the prescribed effort.In terms of our initial analogy, that would be equivalent to letting an athlete run for a fixed amount of time, declaring the winner as the athlete who covers more ground in that prescribed time.
We stress that this paper strives to classify algorithms for a given specific application.Hence, the rationale is to compare competing algorithms that can be used to solve a given problem up to some prescribed error tolerance, from a given starting point, with a view to identifying the alternative that makes the best use of the computational resources available.One could equivalently state that the proposed approach is instance-based, i.e. it focuses on identifying the best algorithm for a specific instance of a given problem.That can be viewed as a complement to the theory of computational complexity [1][2][3], which attempts to establish bounds-typically worst case scenario bounds-for the convergence time for general classes of algorithms, typically based on the dimension of the solution domain.Such an approach can be described as algorithm based, since it strives to classify algorithms based on their performance bounds, and individual problem instances may have very little to do with these bounds, see for example [4,8], and [1,27].It is worth pointing out that, under the proposed approach, the computational effort of an algorithm is no longer determined by its convergence rate.Rather, it is determined by the total number of elementary operations (Floating point operations-FLOP, for example) iteration times per the total number of iterations [5][6][7].
The idea of identifying iterative procedures that enhance the efficiency of the search for the solution of a given problem is hardly new.Multigrid methods [8,9], in which a system is first discretized on a grid, following which the grid resolution is systematically refined, are commonly used in the numerical solution of partial differential equations [8,9].The objective is to save computational resources in the process of reaching a vicinity of the optimal solution.Heuristic procedures, such as genetic algorithms [10], also rely on an efficient use of the computational resources, aiming at getting within a vicinity of the optimal solution in reduced time.However, when advocating for a given method over another one, one has often to rely on extensive empirical data or domain specific mathematics, e.g.[11].The novel feature in this paper is that it proposes a general framework to compare distinct iterative procedures with known rates of convergence based on the overall computational effort prior to convergence.
The main contributions of this paper are twofold.Firstly, considering that the convergence time of an algorithm is not based on its iteration count, but rather on the overall computational effort it employs, we propose a framework for algorithm comparison based on how much of the (limited) computational resources each algorithm applies to reach a desired precision.Secondly, given two algorithms starting from the same initial point, we propose a routine that identifies which algorithm requires less computational effort to converge, for any given error tolerance.The proposed routine requires a previous knowledge of the properties of both algorithms: order of convergence and rate of convergence; as well as the ratio of the computational efforts per iteration of the algorithms under consideration.
This approach addresses directly a trade-off commonly encountered in the design of iterative algorithms.A higher rate or order of convergence is usually achieved at the cost of an increased computation time per iteration.If we specify a desired tolerance  , we can expect the number of iterations required to achieve this tolerance to be smaller for a faster converging algorithm for all small enough  .If we instead adopt total computation time as a metric, the question becomes whether or not faster convergence with respect to iteration will always overcome the disadvantage of an increased computation time per iteration, promising greater efficiency for all small enough  .We show that this is not the case, that is, an algorithm may have both higher rate and higher order of convergence than an alternative and still require greater computation time to achieve tolerance  for all 0   , provided the computation effort per iteration exceeds that of the alternative by a large enough factor.This paper is organized as follows.Section 2 addresses the properties of iterative algorithms.Section 3 derives bounds on the number of iterations to reach a desired precision.Section 4 makes use of these bounds to derive a framework for algorithm comparison based on the overall computational effort, and shows that, under some circumstances, algorithms with lower order of convergence can always converge faster than higher converging ones, provided the computation time per iteration of the latter algorithm exceeds that of the alternative by a large enough factor.Numerical experiments that illustrate the proposed approach are presented in Section 5. Finally, Section 6 concludes the paper.

Numerical Formulation
This paper deals with numerical algorithms which take the form a convergent iterative sequence given a starting element , where 0 0 is an operator on a normed linear space , and V is the solution domain.The objective is to converge to a fixed point , which provides the solution to a given problem of interest.

 ,
When assessing how many evaluations of mapping in Equation ( 1) are necessary until we found ourselves within a prescribed vicinity of the fixed point T V  , two important attributes stand out, namely the order of convergence and the convergence rate, which are defined below: Definition 1 (Order of Convergence) The algorithm converges with order if for some scalar M   , e.g.[12].Definition 2 (Convergence Rate) An algorithm is said to have convergence rate M , if M satisfies Equation ( 2).
If 1 d  and 1 M  , the algorithm is said to be linearly convergent.Observe that both the order of convergence and the convergence rate d M are defined with respect to the iteration counter.
Remark 1 Note that both order of convergence and convergence rate are indicative of how fast an algorithm converges to the solution in terms of the number of iterations.While they may be used to obtain an estimate of how many iterations are needed for the algorithm to converge, they are not sufficient to determine the convergence time, for the latter depends also on how much time each iteration takes up to be completed.
Typically, one lets Algorithm (1) run for a finite number of iterations, until a prescribed vicinity of the solu- be a prescribed error tolerance.We can define the total number of iterations needed for convergence as   0 min : .
Let represent the computational effort of iteration of Algorithm (1).Then, the overall computational effort of Algorithm (1) to attain a desired precision , 0  is defined as In the present analysis, we assume that the overall computational time to attain precision  is proportional to the overall computational effort  G   .Hence, the following analysis can be solely based on the overall computational effort.We also stress that computational time and computational cost are used interchangeably in this paper.Let , 0,1, be a sequence of iteration errors with respect to the solution.Without loss of generality, we employ a normalized error sequence 0 , 0,1, Note that, regardless of the value of 0 , 0 V 1   .That greatly simplifies our subsequent analysis.Moreover, k  can be seen as the ratio of improvement at iteration with respect to the initial solution.For the sake of simplicity, we assume that That can be accomplished by having an arbitrarily high index relabeled as zero.Hence, we have Observe that a sufficient condition for the convergence of Algorithm ( 1 , defined in Equation (7), is a renormalization of the convergence rate that takes into account the initial error 0 M  , defined in (5).Such a renormalization is intended to simplify the subsequent analysis.Moreover, it enables us to assess the perfor-mance of the algorithm at iteration by evaluating the attained relative improvement with respect to the initial solution k k  , as defined in (7).

On the Definition of Computational Effort and Its Relation with the Computation Time
It must be acknowledged that the actual computation time of an algorithm does depend on the platform running the algorithm.Indeed, complexity theory acknowledges this issue and typically addresses it in two distinct ways:  Assuming that the analysis is carried on for a single platform, see for example [4]. By defining computational complexity (effort) in terms of elementary operations performed by the algorithm, e.g.[13].
In this text we assume that the computational effort is defined in terms of elementary operations.We also assume that the elementary operations are defined in such a way that does not depend on the platform.Additionally, the overall computational time (cost) is assumed to be proportional to the overall computational effort, with the platform determining only what the constant of proportionality is.
The definition of elementary operation is left to the user.Since our analysis is focused on the problem, the function   G  could be tailor-made for the problem.Or it could, alternatively, be a general function, such as a counter of floating point operations.

An Upper Bound on the Iteration Counter Prior to Convergence
In this paper, we represent an iterative algorithm of the form in (1) by the pair    describes a convergence rate M with respect to the iteration count, with an iterative algorithm of order .d We start this section with an upper bound on the error achieved by an iterative Algorithm where is the quantity defined in (7).M Proof.It follows from (7) that Equation (8) holds for 1 k  .Assume it also holds for .Then, Equation (2) implies Hence, Equation ( 8) also holds for and that completes the proof.
  be a prescribed error and assume .Suppose also that after of iterations we have The expression above yields: which we refer to as the iteration cost.

Algorithm Comparison Based on the Overall Computational Effort
For the analysis in this section, we assume that the computational effort of Algorithm (1) does not change with the iteration counter, i.e. , 1 . In the analysis that follows, we use  k  , defined in (9), as an estimate for the quantity defined in (3), which indicates the number of iterations required for a given Algorithm to reach a prescribed normalized error  .
The objective is to assess the efficiency of the algorithm based on the overall computational effort prior to reaching the prescribed error  .Such an effort is defined as where g is the per iteration effort (PIE): the computational effort of a single iteration of Algorithm A , and is the order of convergence of d A , and function is defined in (4).
, with PIE .Then, in order to choose the best algorithm for a given problem, one can seek an interval If both algorithms have convergence orders higher is a threshold that indicates a situation when both algorithms are equivalent in terms of computational cost for a prescribed error  .Whenever g g

  
, algorithm A is more economical in terms of computational effort.Otherwise, A is the more efficient algorithm to reach the prescribed error.
When both

 
,1 A M  and are linearly convergent, the threshold Observe that, in such case, the threshold is independent of the tolerance, but it does depend on the initial solution through , defined in (7).If only With the results above, one can define a general procedure for selecting between any two competing algorithms , the one with the fastest convergence with respect to the overall computational effort.Such a procedure is centered on the per iteration effort ratio , g g with g and g denoting, respectively the PIE of Algorithms A and A .The procedure is summarized in Algorithm 1 below.Algorithm 1 (Algorithm Selection Procedure)  , g and g .
In the following sections, we present some experiments which illustrate the tradeoffs for choosing from two iterative algorithms for solving a given problem.The experiments illustrate the thresholds for per iteration ratios and demonstrate the existence of situations for which the choice of a lower convergence algorithm is more efficient in terms of computational effort.

A Perspective on the Proposed Approach
We argue that the proposed approach to selecting algorithms for problem solving is instance-based in the sense that, given a problem and an initial solution, and fixed a desired tolerance, it guides the choice of which algorithm to apply.It enables us to select a priori which of the two alternatives converges employing less computational effort.So far as we know, no equivalent formulation has been introduced in the literature, which would be directly comparable.
A related approach that could be contrasted with the present formulation is complexity theory.However, complexity theory is typically centered on the algorithms, often providing worst-case bounds on the convergence time of algorithms for classes of problems.Such bounds can have very little to do with the particular instance of interest [4,8].Moreover, the responses obtained would be of different nature: complexity theory would identify, for a given problem, which of two has a better performance in a worst-case scenario.Hence, the response would be static and a single algorithm would be identified.The proposed approach, on the other hand, could identify different algorithms as the best alternative for different problem instances.In fact, an example is presented in the next section where two different problem instances yield two different responses.Such an answer would not be possible within a complexity theory framework.

Asymptotic Comparison of Algorithms
Suppose A and A are two linearly convergent algorithms.Note that the ratio     in Equation ( 13) does not depend on  .Thus, even if A has the theoreticcally faster convergence rate, if it also has a PIE g sufficiently larger than that applicable to A , it will be the inferior choice for all tolerance values  .In this section we consider this type of comparison for arbitrary order of convergence .d A common intuiton is that an algorithm with the better rate or order of convergence will eventually outperform an alternative, in the sense that the computation time will be smaller for all small enough  .Accordingly, we say that g overtakes A if for any two PIEs g , g there exists 0 are the respective iteration costs.If neither algorithm overtakes the other, we say they are computationally equivalent.
Interestingly, this relation can be resolved by considering the order of convergence alone.To see this, we first note that the question reduces analytically to an evaluation of the the limit , where Proof.As remarked above, we proceed by evaluating . The case of follows directly from Equation (13).Then suppose which is a positive finite constant, thus the theorem holds for the case , d d 1   .Finally, suppose 1 d  and 1 d  .Then we similarly argue that

Numerical Examples
In order to grasp the meaning of the proposed analysis, a series of numerical experiments are presented in this section, which make comparisons between an incumbent algorithm , with PIE g .The experiments depict a curve of threshold     values, as defined in Equations ( 12)-( 14), for a prescribed range o convergence rates M  , for fixed values of and d M , d .Such a curve is here called the effort ratio frontier.We recall that the ) (  represents the per iteration effort ratio for which both A and A are equivalent in terms of the overall computational effort.For our experiments, we apply an initial point with 0 1   in (6), which implies M  M in (7), for each considered algorithm.
Figure 1 comprises the effort ratio frontier for linearly convergent algorithms and .It is worth mentioning that, since both algorithms are linear, Equation ( 14) implies that the threshold 0.8,1   A is a better choice whenever 1.5 g g   and A is a better choice whenever 1.5 g g   .In the second experiment, we wish to evaluate the effect of the convergence rate on the behavior of the effort ratio frontier.To this end, we compare two linearly convergent algorithms and , for varying M  , while presenting a series of frontiers, for selected values of rate M .The results are depicted in Figure 2. One can notice that, as the convergence rate increases, i.e.Algorithm becomes slower, the value of the threshold 4.9 , if 0.9.
The third experiment is aimed at providing some insight on the influence of the order of convergence on the effort ratio frontier.Figure 3 conveys the frontier for an incumbent algorithm and a challenging algorithm , for a fixed .Note that for .That means that Algorithm 0.9,3A is equivalent in terms of overall computation effort to a given , with the same effort per iteration.Moreover, any algorithm , with , is more efficient than A for the selected error  .This illustrates that the intuition that a higher order algorithm   d d  0 is always better than its lower order counterpart can be misleading.Furthermore, by Theorem 2, any two algorithms of order 2 and 3 are computationally equivalent, and it is therefore possible for the order 2 algorithm to have strictly smaller computation time for all   .
Our forth experiment generalizes the previous one and derives the effort ratio frontier for a challenging algorithm strates the thresholds     below which the choice of A is more advantageous.The collection of curves comprising Figure 4 illustrate that a second order algorithm can outperform many higher order algorithms for appropriate values of per iteration effort.As an example, consider an algorithm , with

 
, 2 , 0.9 . Observe that an A thus defined outperforms every other depicted algorithm, even the fifth-order algorithm .
  0.9,5 A  The fifth experiment, depicted in , for any precision  up to the order of .50 10  In the last experiment, we randomly generated a matrix A and a vector to comprise a linear system of the form , with 2395 equations and unknowns.Two well known algorithms were employed to solve this system: Gauss-Seidel and Conjugate Gradient.For both algorithms, the convergence rate was estimated as , where .
Here, is the number of iterations up to convergence.The computational effort per iteration of each algorithm is the total number of sums and multiplications.

Concluding Remarks
This paper introduces a novel approach for comparing algorithms in terms of their overall computational effort, rather than in terms of the convergence-order, convergence-rate pair.A threshold is derived for the ratio between the per-iteration effort of two competing algorithms that indicate which of the competing algorithm makes the more efficient use of the computational resources available.In addition, an algorithm is proposed for choosing between two competing algorithms under the proposed setting, which makes use of this threshold.The derived results are applied in a few examples that provide an insight on the compromises involved in the proposed approach.The experiments illustrate that a lower order algorithm can be more advantageous in terms of the overall computational effort to reach a prescribed error than a higher order counterpart.In particular, even as we let the prescribed error approach the order of , using a lower order algorithm can be more advantageous under suitable conditions.This demonstrates that an analysis of algorithms based only on their order and rate of convergence can be very misleading.By applying an analysis based on the computational effort, on the other hand, one can identify the algorithm that makes the best use of the (limited) computational resources made available.

Acknowledgments
select Algorithm A and terminate.


the prescribed error  .As a result, the frontier in Figure1is valid for all possible values of .As one can infer from Algorithm 1, the shadowed area below the frontier indicates the values of per iteration effort ratio for which Algorithm A is more efficient.For values of  outside of this area, A is a better choice.As an illustrative example, let us fix .For this value, we have ,

Figure 5 ,.
replicates the previous experiment for varying values of error 0 The thicker-red line is the frontier for a precision .The results show that lower order algorithms tend to be more attractive for higher values of 50   .However, even for very low values of  , lower order algorithms can remain appealing.Note in Figure5that the region below the threshold     does not vanish as ,

Figure 6
Figure 6 illustrates the results: the red-solid line and the blue-dashed line are the effort ratio frontiers for two distinct errors:

Figure 5 . 6 MFigure 6 .
Figure 5. Influence of tolerance error on the boundary iteration effort,  0.95 .
A are computationally equivalent.We next state the main result.
A .