Comparison of Alternative Strategies for Multilevel Optimization of Hierarchical Systems

The augmented Lagrangian penalty formulation and four different coordination strategies are used to examine the numerical behavior of Analytical Target Cascading (ATC) for multilevel optimization of hierarchical systems. The coordination strategies considered include augmented Lagrangian using the method of multipliers and alternating direction method of multipliers, diagonal quadratic approximation, and truncated diagonal quadratic approximation. Properties examined include computational cost and solution accuracy based on the selected values for the different parameters that appear in each formulation. The different strategies are implemented using twoand three-level decomposed example problems. While the results show the interaction between the selected ATC formulation and the values of associated parameters, they clearly highlight the impact they could have on both the solution accuracy and computational cost.


Introduction
A complex optimization problem may be decomposed into two or more subsystems with partitioned design variables and separate objective functions and design constraints.The layered architecture of hierarchically decomposed multilevel systems is illustrated by the example in Figure 1; the hierarchy can be expanded to include several levels with each containing multiple elements.Because the number of design variables in each element represents a fraction of the total set, dimensionality of each element optimization problem is reduced.Decomposed optimization problems require a coordination strategy that in the ensuing iterative solution process ensures satisfaction of system-level design criteria and proper convergence to an optimum design.
Analytical target cascading (ATC) [1,2] was developed for systems such as that shown in were defined for the responses and targets as well as the linking (or shared design) variables.The multilevel optimization problem was solved while minimizing the deviation tolerances and satisfying the design constraints.ATC solution has been shown to converge to a point that satisfies the necessary optimality conditions of the original design optimization problem [6].Using a formulation of ATC with similarities to that in [3], the inequality constraints on deviation tolerances were brought into the objective function to form an augmented objective function; this formulation included the addition of weight factors to the deviation tolerances.
The scaled tolerance formulation [3] was used in [7] to investigate the numerical behavior of the ATC methodology and the local convergence properties of different coordination strategies.They examined the effects of linking variables, subproblem solution accuracy, and the number of significant digits on numerical stability.
The commonly used ATC formulations are based on quadratic penalty (QP) functions [7][8][9][10].Numerical experiments with these formulations show significant computational effort to obtain accurate solutions.The QP functions minimize the consistency constraints (equality or inequality) to force targets and responses to match.Ideally, these consistency constraints have to be relaxed, allowing inconsistencies between targets and responses that are gradually eliminated in the iterative solution process.For the QP function, in general, large weight factors are required to find accurate solutions [11].Due to lack of a mathematical relationship between weight factors and solution accuracy, the weight factors are given arbitrarily large values that may cause computational difficulties [8,10].
An iterative method was presented in [8] for finding the minimal penalty weight factors that provide converged solutions within user-specified inconsistency tolerances, and its effectiveness was demonstrated with several examples.This method contains an inner and an outer loop.The inner loop solves the decomposed ATC problem with a coordination scheme.The outer loop updates the penalty weight factors based on information obtained from the inner loop.The iterative method calculates the Lagrange multipliers and derivatives of the response function to update the weight factors.
In the separable ordinary Lagrangian (OL) approach, a large-scale convex nonlinear programming problem is formulated and decomposed using the ATC [12].By combining the classical Lagrangian duality and the augmented Lagrangian duality, a simple method was proposedin [13] for decomposition without imposing restrictive conditions to alleviate the difficulty of convexity requirement.The modified Lagrangian dual formulation and coordination enhances the ATC performance [14] over those proposed earlier in the literature.
ATC problem relaxation with an augmented Lagrangian penalty (ALP) function using the method of multipliers (AL) and the alternating direction method of multipliers (AL-AD) was proposed and investigated in [15].By means of the ALP relaxation, ill-conditioning is reduced for the inner loop because accurate solutions can be obtained for smaller weight factors.This formulation was later adopted in [16] that used Diagonal Quadratic Approximation (DQA) and Truncated DQA (TDQA) for parallelization of ATC.Similarly, the ALP formulation was also applied in [17], but three different updating methods were used in the outer loop.
In this paper, the (ALP) function using the method of multipliers with four different coordination strategies (i.e., AL, AL-AD, DQA, and TDQA) is used to study the numerical behavior of ATC.Moreover, the role of two penalty parameters that can have large influence on solution accuracy and computational cost is investigated.The effects of the penalty parameter updating coefficient in the outer loop and the initial guessed values for the decision variables to start the multilevel optimization process are examined by solving three example problems.

Overview of ATC
For a decomposed system with N levels and M elements, as shown in Figure 2, the subscripts ij denote the jth element in the ith level [15].The vector of local variables is denoted by x ij with t ij is the vector of target variables shared by element ij and its parent at level i -1; E i is the set of lements at level i (e.g., E 3 = {4, 5, 6} in Figure 2); e is the set of children of element ij (e.g., D 22 = {4, 5}); f ij is the local objective; g ij is the vector of local inequality constraints; and h ij is the vector of local equality constraints.Hence, an all-in-one (AIO) problem of such a system is defined as In the ATC formulation adopted from [15], response copies r ij are introduced to make the objective function and constraints separable, which leads to the addition of consistency constraints expressed as c ij = t ij -r ij = 0, where c ij is a measure of inconsistency between the targets and the corresponding responses in element ij.Moreover, the objective function is augmented by the addition of a penalty term π that leads to the relaxed form of the AIO problem formulated as ,  c c c NM   in the hierarchy.Now, the relaxed AIO problem in Equation ( 2) can be decomposed into separate subproblems (e.g., P ij for element ij) involving only a subset of decision variables x ij given by where the penalty parameter updating coefficient β is required to be ≥1 for convex objective functions [15]. with The double-loop approach in AL avoids setting arbitrarily large weight factors that can often cause ill-conditioning in the solution.The weight factors are updated using the information obtained from the inner loop.Whereas the inner loop is very computationally expensive, the outer loop is very inexpensive.It has been shown in the literature that the AL method can significantly reduce the computational cost of solving a problem with ATC without loss of accuracy.
In QP, OL, and ALP, the penalty term takes the form

Alternative Coordination Strategies
The ALP method contains two loops.In the inner loop, the decomposed ATC problem is solved for fixed penalty parameters (λ and w) whereas in the outer loop, an algorithm is applied to update both λ and w as For the ALP formulation, thefour alternative coordinateon strategies are described by the algorithms outlined in   For AL and AL-AD in Figure 3, the outer-loop convergence criterion is satisfied when the reduction of inconsistencies in two successive solutions is sufficiently small (i.e.,    

Yes
where k denotes the outer loop counter and τ is a user-defined termination tolerance).The inner loop convergence criterion is reached when the difference in the objective function values in two consecutive inner loop iterations is less than 10 ATC    .In the DQA and TDQA algorithms in Figure 4, the convergence criteria are defined as where σ in and σ out are the inner and outer loop termination tolerances with 10 and σ out = τ.

Illustrative Example Problems
The effect of β on the accuracy and computational cost has not been addressed in the literature.Although it has been mentioned that β can take a wide range of values, it is unclear what value must be chosen with respect to the desired levels of accuracy and computational cost as well as the selected ATC solution methodology and coordination strategy.Furthermore, since in ATC the initial values for response/target and linking variables are selected at random, it is unclear what effects these values would have on the ATC results.
To examine these effects, three different example problems are solved using the four different methods of ATC described in the previous section.For each method, the solution starts from different initial guessed values (IGV) that correspond to different randomly selected design points relative to the optimum point.The solution is repeated for 20 different values of β and every IGV.
Two performance metrics are considered: the computational cost that is captured by the number of function evaluations, and the error, which is defined as where x * is the exact optimum design point and x ATC is the solution found by ATC.All of the ATC formulations Yes End (Updating the penalty parameters in the outer loop) Set k = k + 1 and update the Lagrange multipliers where Γ is the step size, and set s = s + 1. Convergence?

(Updating the penalty parameters)
Set k = k + 1 and update the Lagrange multipliers where Γ is the step size.cited were developed into separate MATLAB codes and used to solve the following example problems.
Problem 1: This is a 7-variable geometric programming problem with the AIO formulation expressed as . .
This problem is decomposed into a two-level hierarchy [10] with a single element at the top level and another element at the bottom level.Local variables in the top element are x 1 , x 3 and x 4 along with and constraints g 2 and h 2 .The response/target variable for the two elements is x 5 .
The initial values for the penalty parameters are defined as λ (0) = 0 and w (0) = 1.The starting design point is x (0) = [3,3,3,3,3,3,3]  or 2.3 for most cases.For different IGV, the relationship between cost and β is similar, but it is not necessarily monotonic.Due to this similarity, only the upper and lower bounds are shown for each case using the corresponding IGV numbers.It appears that the value of β also has an influence on the error, especially for larger tolerances as shown in Figure 7.The solution error trends for different IGV are identical; hence, the plot of error from Equation ( 9) versus β is shown for only one case.Problem 2: This is a 14-variable geometric programming problem with the AIO formulation expressed as [5] 1 2 14   respectively, whereas x 5 and x 11 are the linking variables between elements 2 -3 and 4 -5, respectively, both of which are coordinated in element 1.
The initial values for the penalty parameters in all the formulations are taken as λ (0) = 0 and w (0) = 1.The initial design point is x (0) = [5, 5, 2.76, 0.25, 1.26, 4.64,1.39,0.67, 0.76, 1.7, 2.26, 1.41, 2.71, 2.66] for all the formulations, which is the same as that used in the previous studies cited.The IGV for x 1 , x 2 , x 3 , x 4 , x 5 , x 6 and x 11 are randomly selected in the design domain with a relative distance of {0, 2, 4, 6, 8, 10, 20, 40, 70, 100} from the optimum point with the corresponding values shown in Table 1.These variables need to have predefined values to start the ATC solution sequence.For example in AL, it is necessary to guess values for response/linking variables x 1 , x 2 , x 5 and x 11 from the lower level elements to solve element 1, response value for x 3 from element 4 to solve element 2 and response value for x 6 from element 5 to solve element 3.
The plots in Figure 11 show that error in both AL and DQA depends on the β value, especially with τ = 10 -2 , and this is very critical for the DQA method.The error in AL is nearly uniform for β > 1.5 while in DQA it has an ascending mode.
Figure 12 indicates that the dependency of error on IGV for AL-AD and DQA is observable at τ = 10 -2 , diminishes slightly for TDQA at τ = 10 -3 , and vanishes at τ = 10 -4 .It can be concluded that TDQA and, to some extent, AL-AD are much more dependent on the IGV than DQA and AL.The computational costs of AL and DQA  The problem is decomposed into three elements in two levels: A top-level element with elements 2 and 3 at level 2. There is no local variable or constraint at the top level.Local variables of element 2 are x 4 and x 5 along with constraints g 1 and g 2 .Local variables of element 3 are x 6 and x 7 with inequality constraints g 3 and g 4 .There is no target variable in this decomposed structure.The linking variables x 1 , x 2 and x 3 are shared between elements 2 and 3 and coordinated in element 1.
The starting design point is x (0) = [0, 0, 0, 0, 0, 0, 0] for all the formulations.The IGV for x 1 , x 2 and x 3 are randomly selected in design domain at a distance nearly equal to {0, 2, 4, 6, 8, 10} from the optimum point with the corresponding values shown in Table 2.
Figures 13 and 14 show that the computational cost changes greatly with variations in β value and that the fluctuations are more pronounced for the smaller τ values.Figure 15 shows that error in AL is slightly dependent on β just for τ = 10 -2 and it nearly disappears for τ = 10 -3 and τ = 10 -4 .The error in DQA is more dependent on β than AL.
Figure 16 indicates that the computational cost dependency on IGV is negligible; the changes in computational cost are lower than 5% for all the methods.The computational cost for AL and DQA, especially for DQA, changes significantly while the error is nearly identical for tighter tolerances.Also, dependency of the error on IGV in AL-AD and TDQA is observable at τ = 10 -2 and vanishes for tighter tolerances.

Conclusions
The numerical behavior of the analytical target cascading (ATC) method was investigated for multilevel optimization of hierarchical systems based on different solution strategies.The strategies considered included Augmented Lagrangian with method of multipliers (AL), Augmented Lagrangian with Alternating Direction method of multipliers (AL-AD), Diagonal Quadratic Approximation (DQA), and Truncated Diagonal Quadratic Approximation (TDQA).Three example problems were used to examine the effects of penalty parameter updating coefficient β and convergence tolerance τ on the computational cost and solution accuracy.In addition, the effect of initial guessed values (IGV) for the response/target and linking variables was also investigated.
The results showed that although the computational cost in the AL and DQA methods is influenced by the value of β, it does not follow a specific ascending/descending pattern.The computational cost dependency on β is generally higher with increasing the convergence tolerance.Although previous studies recommend β > 1 and 2 < β < 3, the results found here indicate that 1 < β < 2 is also acceptable and that no single value of β can be suggested to reduce the computational cost in all the ATC-based optimization problems and solution strategies.The results also showed that the relationship between the computational cost and β is not dependent on the IGV as best noted in the results of the DQA method.
In terms of solution accuracy, AL and DQA results depend on the β value irrespective of the selected IGV.With higher β values, better accuracy is obtained with AL while the behavior is different for DQA.The dependency of solution accuracy on β is reduced with tighter tolerance values.Comparison of the DQA and AL results indicate that AL is more stable in terms of accuracy whereas DQA needs to have a tighter tolerance to obtain reasonable accuracy, although a tighter tolerance causes significant changes in the computational cost.In the absence of optimum β for computational cost and accuracy, the AL method appears to be more reliable than DQA.
By moving the IGV farther away from the corresponding values at the point of optimum, all methods required more function calls, as expected.While the so lution accuracy in AL and DQA was not influenced by the choice of IGV, the trend was quite the opposite for AL-AD and TDQA as they both had great dependency on IGV.The inner loop convergence requirement is more costly for AL and DQA than TDQA and AL-AD.Furthermore, the increase in computational cost for AL-AD and TDQA is much greater than AL and DQA when IGV is farther away from the optimum, but TDQA and AL-AD still show better performance.AL-AD and TDQA need tighter termination tolerances to have better accuracy.
In summary, the τ and β values have greater effect on AL and DQA solutions than the other two coordination strategies and they are not influenced by IGV.Hence, in using AL and DQA, appropriate values for these two parameters can enhance both solution accuracy and computational cost.In contrast, the computational cost and accuracy of AL-AD and TDQA are greatly dependent on the IGV.
As part of the future work, the computational characteristics of a newly developed approach based on the exponential method of multipliers within the framework of ATC will be investigated.

Figure 1 .Figure 1 .
Figure 1.An illustrative model of a hierarchically decomposed multilevel system.

Figure 2 .
Figure 2.Variable allocation in a hierarchical system.

1 x(
Define the decomposed problem and the initial design   0 x .Set the loop iteration number k = 0. Define the penalty parameters   0 λ and   0 w for the first iteration.(Solution of the ATC problem) Solve the decomposed problem in two levels, parallel solution of odd/even elements, with fixed   λ k and   w k to obtain an updated solution   Updating the penalty parameters) Set k = k + 1 and update the Lagrange multipliers

)(
Define the decomposed problem and the initial design   0 x .Set the outer loop iteration number k = 0. Define the penalty parameters   0 λ and   0 w for the first iteration.(Solution of the ATC problem in the inner loop) Solve the decomposed problem in hierarchical order with fixed   λ k and   w k to obtain an updated solution   Updating the penalty parameters in the outer loop) Set k = k + 1 and update the Lagrange multipliers

((
Solution of the ATC problem in the inner loop)Given  x k , the solution of the previous outer loop iteration, set s = 0, where s is the inner loop iteration number.For all elements, solve for x ij in parallel and obtain   the solution of the s th inner loop iteration.Define the decomposed problem and the initial design   0 x .Set the outer loop iteration number k = 0. Define the penalty parameters   Solution of the ATC problem) For all elements, solve for x ij in parallel and obtain   1 x k  .
jective function subject to the inequality constraint g 1 and equality constraint h 1 .Variables x 2 , x 6 and x 7 are the local design variables for the bottom element with the objec-

Figure 8
is used to further highlight the effect of IGV on both function evaluations and error under different solution strategies and convergence tolerances.The plots are shown only for β = 2 case with three convergence tolerance values.The dependency of error on IGV for AL-AD and TDQA is very high at τ = 10 -2 , but minimal or nonexistent at τ = 10 -3 and τ = 10 -4 .Thus, TDQA and AL-AD are much more dependent on IGV than DQA and AL.Theefficiency of AL and DQA methods changes drastically with tighter termination tolerance, while solution error for AL and DQA does not change very much.Hence, for larger τ, AL and DQA are less costly, whereas for smaller τ, AL-AD and TDQA are more efficient.

Figure 9 .Figure 10 . 3 :Figure 11 .
Figure 9. Cost trends for AL-based solution of Problem 2 using (a) τ = 10 -2 , (b) τ = 10 -3 , and (c) τ = 10 -4 .drastically change with tighter termination tolerance, while solution errors in AL and DQA do not change very much.In contrast to AL and DQA, the error in AL-AD and TDQA changes with different τ values while the