Simultaneous Optimization of Incomplete Multi-Response Experiments

This article attempts to develop a simultaneous optimization procedure of several response variables from incomplete multi-response experiments. In incomplete multi-response experiments all the responses (p) are not recorded from all the experimental units (n). Two situations of multi-response experiments considered are (i) on n1 units all the responses are recorded while on n n n 2 1 = − units a subset of p p 2 < responses is recorded and (ii) on n2 units all the responses (p) are recorded, on n1 units a subset of p p 1 < responses is recorded and on ( ) 3 1 2 n n n n = − − units the remaining subset of ( ) p p p 2 1 = − responses is recorded. The procedure of estimation of parameters from linear multi-response models for incomplete multi-response experiments has been developed for both the situations. It has been shown that the parameter estimates are consistent and asymptotically unbiased. Using these parameter estimates, simultaneous optimization of incomplete multi-response experiments is attempted following the generalized distance criterion [1]. For the implementation of these procedures, SAS codes have been developed for both complete (k ≤ 5, p = 5) and incomplete (k ≤ 5, p1 = 2, 3 and p2 = 2, 3, where k is the number of factors) multi-response experiments. The procedure developed is illustrated with the help of a real data set.


Introduction
The experimental situations where more than one response is observed for a set of input combinations are known as multi-response experiments.In these experiments, experimenter is interested in determining the optimum combination of input factors.There are several experimental situations in which simultaneous optimization of several response variables is required.For example, in food processing experiments, a food technologist may be interested in determining the optimum combinations of various ingredients of a product or operability conditions for obtaining the product on the basis of acceptability, colour, flavour, nutritional value, economic and biochemical parameters, etc.Such experimental situations do exist in many other disciplines of science.For some more such experimental situations, a reference may be made to [2].The experiments for simultaneous optimization of several responses are generally conducted in response surface designs.The determination of the conditions on the set of input variables that simultaneously optimize a multi-response function is of particular interest.
Multi-response optimization is the process of determining a combination of levels of input factors that yield optimal values of several response variables taken simultaneously.In such experimental situations, one should not obtain the optima for each response variable separately.When more than one response variables are under investigation simultaneously, the meaning of optimum combination becomes unclear since there is no unique way to order multivariate values of a multi-response function.Further the conditions that are optimal for one response variable may be far from optimal or may even be physically impractical for the other response variables.Therefore, the procedure of simultaneous optimization of several response variables is required.In the literature, different procedures for simultaneous optimization of several responses have been suggested (see e.g.[1] [3]- [7]).
[4] and [1] considered an optimization problem associated with a dual response system consisting of two responses.[3] and [6] adopted desirability function approach for optimization of complete multi-response experiments.[1] gave an optimization procedure based on minimization of generalized distance.The above procedures are suitable only for the situations where the process is optimized either for maximization of all response variables or minimization of all the response variables.
There, however, do occur experimental situations particularly in osmotic dehydration studies where one wants to optimize the process keeping in view the maximum moisture loss and minimum solids gain.Such a situation has also been encountered by [8].Hence, there is a need to develop a procedure of optimization where optimum conditions for maximization of some of the response variables and minimization of the remaining response variables is desired.It is, therefore, suggested that we take the negative values of the response variables to be minimized and determine the optimum combination using the procedure of maximization of all the response variables.
The above discussion relates to the experimental situations, where the data on all the response variables are collected from each design point.In some experimental situations, it may not be possible to observe data on all the response variables from all the experimental units.The data from only a subset of responses are collected from some experimental units and other subset of responses from other experimental units.Some of the responses may be common to two or more experimental units.For example, consider an experiment on modified atmospheric packaging (MAP) conducted to determine the optimum combinations of temperature, packaging material and time on the basis of different physical parameters like physiological loss of weight (PLW), moisture content (MC) and texture; biochemical parameters like total sugars, flavanoids and total soluble solids (TSS); microbiological parameters like, total count and caliform count and sensory parameters like, colour, flavour and overall acceptability.In such situations it may happen that after some days, the data on a subset of parameters such as sensory may not be available due to spoilage of the packaged food material for some of the treatment combinations.In this particular situation, the data on sensory characters may not be available at 25 degree centigrade after 11 days for the packaging material HDPE (High Density polyethylene).Further, it may be possible to record the data on all the response variables in some of the replications of the treatment combinations due to variability in the raw material and their differences in initial microbial count.
In response surface designs generally some of the design points are replicated for testing the lack of fit.Due to constraints on time or equipments, it may not be possible to collect data on all the response variables from each of the replicated design points.Therefore, the experimenter may divide the response variables in as many groups as the replications of a design point.Some of the response variables are common in each of the sets.Therefore, this is also an incomplete multi-response situation, where simultaneous optimization of several response variables is required.
Cold storage studies involving different temperatures may also encounter such situations.Field experiments where certain plots or treatment combinations are affected by pests or insects may also have situations as described above.For example, if a field crop is affected by insects/pests which affect only the leaf and damage it, we may be able to record data on other response variables but not on the response variables related to leaf or say if the grain/pod is affected and damaged, we may not be able to record data on yield but can record physical and other parameters like plant height etc.
From the above discussion we can see that there are two types of multi-response experiments viz.complete multi-response and incomplete multi-response experiments where interest is in simultaneous optimization of several response variables.As mentioned above, some procedures of simultaneous optimization of several response variables are available for complete multi-response situations.These procedures are involved quite algebraically, therefore, for the benefit of experimenters it is required to develop some computer based procedure for simultaneous optimization of several response variables.
For incomplete multi-response situations, it seems that no analytical procedure for simultaneous optimization of several response variables is available in the literature.The basic difference between simultaneous optimization from complete and incomplete multi-response experiments is in estimation of parameters in multi-response equations.As shown in [2] that in complete multi-response situations, the parameter estimates are same as that of the estimates obtained from fitting response surfaces individually for all the response variables.This, however, may not be the case with incomplete multi-response situations.Therefore, in the present investigation an attempt has been made to obtain the parameter estimates for the data obtained from incomplete multi-response experiments.Once the parameter estimates are obtained, the procedure of simultaneous optimization given by [1] has been used.
In Section 2, the procedure of estimation of parameters from incomplete multi-response experiments is developed.Two situations of incomplete multi-response considered are (i) on 1 n units all the p responses are recorded while on 2 responses is recorded and (ii) on 2 n units all the responses (p) are recorded, on 1 n units a subset of 1 p p < responses is recorded and on ( ) p p p = − responses is recorded.In Section 3, a stepwise procedure of simultaneous optimization of complete/incomplete multi-response is described in brief.The procedure of parameter estimation and obtaining simultaneous optima is illustrated with the help of examples in Section 4. SAS codes have been developed for the whole procedure for both complete (k ≤ 5, p = 5) and incomplete (k ≤ 5, p 1 = 2, 3 and p 2 = 2, 3, where k is the number of factors) multi-response experiments.The codes are available with the first author and can be obtained by sending an E-mail to nandi_stat@yahoo.co.in.

Parameter Estimation from Incomplete Multi-Response Experiments
In this section, we develop an estimation procedure of parameters of linear multi-response models for incomplete multi-response situations.We consider two cases for incomplete multi-response as described in the sequel.

Case I
Let there be p (= p 1 + p 2 ) response variables which are measured for each design point of k input variables (factors) 1 2 , , , k x x x  .Let there are n = (n 1 + n 2 ) experimental units.From n 1 experimental units all the response variables are observed and from n 2 experimental units only p 2 < p responses are observed.It is assumed that n 1 > m (the number of parameters to be estimated from the model).The pictorial representation in Figure 1 may be helpful for better understanding of the situation: For the first p 1 response variables, model can be written as where y i : n 1 × 1 vector of observations on the i th response variable, X i : n 1 × m matrix of rank m of known functions of the settings of the coded variables, β i : m × 1 vector of unknown constant parameters, and ε i : n 1 × 1 vector of random error associated with the i th response variable ( ) Model for each of the remaining p 2 response variables is ( ) y j : (n 1 + n 2 ) × 1 vector of observations on the j th response variable, β j : m × 1 vector of unknown constant parameters, and ε j : (n 1 + n 2 ) × 1 vector of random error associated with the j th response variable ( ) . Now combining and rolling down the models ( 1) and ( 2) we have where Y 1 is (n 1 p 1 × 1) vector of observations on p 1 response variables, 2 * Y is (n 1 p 2 × 1) vector of observations on p 2 response variables and 2 * * Y is (n 2 p 2 × 1) vector of observations on p 2 response variables.So Y ( ) 1 n p n p n p + + × vector of the observation vector on all (p 1 + p 2 ) response variables.Here, X is the design matrix for first n 1 design points and 2 * * X is the design matrix for remaining n 2 design points.Here 1 2 * = X X as all the response variables are observed from the first n 1 experimental units.Therefore, The vector of parameters is ) where β 1 (mp 1 × 1) is the vector of parameters for first p 1 response variables and β 2 (mp 2 × 1) is the vector of parameters for remaining p 2 response variables.So β is the vector of parameters of order (mp 1 + mp 2 ) × 1.
Similarly, one can also write the expression for residual vectors as where ε 1 is (n 1 p 1 × 1) vector of residuals for first p 1 response variables, * 2 ε is (n 1 p 2 × 1) vector of residuals for remaining p 2 response variables and ** 2 ε is (n 2 p 2 × 1) vector of residuals for p 2 response variables.
From the structure of residual vector, the dispersion matrix of residual vector is ( ) .
As is clear from (8) that the covariance between p 1 and p 2 can only be possible on n 1 observations and for n 2 observations covariance cannot be obtained (zero).In (8) 11 Σ represents variance/covariance within the p 1 response variables on n 1 observations; 12 Σ represents variance/covariance between p 1 and p 2 response variables on first n 1 observations and 22 Σ represents variance/covariance within the p 2 response variables on n 2 observations.
It may be noted here that the form of Z in ( 5) with unequal number of observations for each set of responses (n 1 for p 1 and n 1 +n 2 for p 2 response variables), requires additional effort to estimate β , for the incomplete set of response variables.We now regard the above model as a single-equation regression model and thus made use of Aitken's Generalized Least Squares (GLS) technique to estimate the parameter vector.Asymptotically unbiased estimator of β can be obtained from the normal equations 1 1 .
Asymptotically unbiased estimator of β is ( ) The dispersion matrix of β is Equations ( 10) and ( 11) require knowledge of Ω .If Ω is unknown, as is usually the case, then an estimate of β can be obtained by replacing Ω in (10) by an estimate Ω .This requires a lot of efforts as described in the sequel.

Estimation of β When Ω Is Unknown
It is easy to prove that if we replace Ω by any consistent estimator in (10), the resulting estimator of β is consistent and has the same asymptotic distribution as the estimator of β which used Ω itself.It, therefore, makes no difference, asymptotically, which consistent estimate of Ω is used.There are a number of consistent estimators available in [9] for two response variables with unequal number of observations.These estimators are based on least-square residuals and are given in the sequel.
Let E 1 (n 1 × p 1 ) and E 2 ((n 1 + n 2 ) × p 2 ) be the matrices of least-squares residuals for the p 1 and p 2 response variables respectively.E 2 can be partitioned analogously to X 2 and Y 2 as Now we define the following quantities: The consistent estimators of Ω can be obtained by replacing the following quantities in (8) ˆˆˆ, , Once a consistent estimator of Ω is obtained, then using GLS and replacing the estimator of Ω ( ) Ω into (10) we can obtain the parameter estimates as follows The dispersion matrix of β is ( ) Following the proof of [10], one can easily see that the estimators of β in (15) have distributions which are symmetric around the true value of β and their mean exists.These are, therefore, unbiased.

Case-II
Let there be p (=p 1 + p 2 ) response variables which are measured for each design point of k input variables (factors) 1 2 , , , k x x x  .Let there are p response variables to be observed from n 1 + n 2 + n 3 experimental units.All the p response variables are observed from n 2 experimental units.Only p 1 response variables are observed from first 1 n experimental units and from remaining n 3 experimental units only p 2 response variables are observed.It is assumed that 1 2 n n m + > and 2 3 n n m + > (the number of parameters to be estimated from the model).The pictorial representation in Figure 2 may be helpful for better understanding of the situation: For the first p 1 response variables, model can be written as ( ) where y i : (n 1 + n 2 ) × 1 vector of observations on the i th response variable, X i : (n 1 + n 2 ) × m matrix of rank m of known functions of the settings of the coded variables, β i : m × 1 vector of unknown constant parameters, and ε i : (n 1 + n 2 ) × 1 vector of random error associated with the i th response variable ( ) Model for each of the remaining p 2 response variables is ( ) where y j : (n 2 + n 3 ) × 1 vector of observations on the j th response variable, X j : (n 2 + n 3) × m matrix of rank m of known functions of the settings of the coded variables, β j : m × 1 vector of unknown constant parameters, and ε j : (n 2 + n 3 ) × 1 vector of random error associated with the j th response variable ( ) ; X as all the response variables are observed from the n 2 experimental units.Therefore, where Y, β and ε are column vectors with ( ) ( ) components respectively, and the order of the design matrix Z is ( ) .
We now regard the above model as a single equation regression model and making use of Aitken's GLS technique asymptotically unbiased estimator of β can be obtained as in (10) in Case I and the dispersion matrix of estimator of β as in (11).These equations require the knowledge of Ω .If Ω is unknown, as is usually the case, then an estimate of β can be obtained by replacing Ω by its estimator Ω .

Estimation of β When Ω Is Unknown
In most of the practical situation Ω is seldom known, so some estimate of Ω will have to be used in its place.
It is easy to prove that if we replace Ω in (23) by any consistent estimator, the resulting estimator of β is consistent and has the same asymptotic distribution as the estimator of β which used Ω itself.It, therefore, makes no difference, asymptotically, which consistent estimator of Ω is used.There are a number of consistent estimators available in [9] for two response variables with unequal number of observations.These are described in the sequel.
Let E 1 ((n 1 + n 2 ) × p 1 ) and E 2 ((n 2 + n 3 ) × p 2 ) be the matrices of least-squares residuals for the p 1 and p 2 response variables respectively.E i 's can be partitioned analogously to Y i and X i in (19) and (20) Now we define the following quantities: The consistent estimators of Ω in (23) can be obtained by replacing the following quantities [11], Once a consistent estimator of Ω is obtained, then using GLS and replacing the estimator of Ω ( ) ( ) As in Case I, following the proof of [10], one can easily see that the estimators of β based on the estimators given in (27) have distributions which are symmetric around the true value of β and their mean exists.They are, therefore, unbiased.

Simultaneous Optimization Procedures
In the present study, simultaneous optimization of several responses based on minimization of generalized distance given by [1] has been considered.SAS codes have been developed for the whole procedure for both complete (k ≤ 5, p = 5) and incomplete (k ≤ 5, p 1 = 2, 3 and p 2 = 2, 3, where 1 2 p p p + = , the total number of response variables) multi-response experiments and illustrated with the help of an example.
For the sake of completeness, we describe the procedure in brief.Using the parameter estimates obtained in Section 2, the prediction equation for the i th response at design point ( ) where ( ) is the vector of coded variables, ( ) i ′ g x is vector of the same form as row of the design matrix X i evaluated at the design point x, ˆi β is the GLS estimator of β i .It follows that ( ) 1, 2, , .
Combining all the response variables together we can get the prediction equation at design point ( ) where ( ) ( ) ( ) ( ) is the vector of predicted responses at the design point x.
Let ˆi φ be the optimum value of ( ) ˆi Y x optimized individually over the experimental region ( ) and let ( ) . If these individual optima are attained at the same set x, of operating conditions, then an "ideal" optimum is said to be achieved.The problem of multi-response optimization is, therefore, obviously solved and no further work is needed.However, such an ideal optimum may rarely exist.In more general situations one might consider finding compromising conditions on the input factors that are somewhat favorable to all responses.Such a deviation of the compromising conditions from the ideal optimum is formulated by means of a distance function, which measures the distance of ( ) Ŷ x from φ , the vector of individual optima.The distance function [1] is defined as where ( ) ( ) ( ) ( ) is the vector of predicted response at the design point x.φ is the vector of individual optima and ( ) is the dispersion matrix of the predicted response vector at the design point x.
The above distance function is termed as generalized distance by [1].The multi-response optimization then involves finding the optimum point x that minimize this generalized distance function (33) over the experimental region W. If x 0 is the point in the experimental region at which ( ) and if m 0 is the value of this minimum, then one may describe the experimental conditions at x 0 as being near optimal for each of the p response functions.The smaller the value of m 0 , the closer these conditions are to representing an "ideal" optimum.To summarize, the steps for obtaining a simultaneous optima of a multi-response function consisting of p responses are: 1) Obtain the predicted equations using the complete/incomplete multi-response data and the method of generalized least squares.
3) Optimize the predicted responses individually over the experimental region W to obtain the vector φ of estimated individual optima.The method of ridge analysis as given in [2] may be used for determination of the elements of φ .
4) The distance measure being a function of x alone is minimized over W.
Here the controlled random search procedure given in [12] can be employed effectively.
The results of ridge analysis for optimization of individual responses can be obtained using PROC RSREG in SAS [13] by simply using the RIDGE MAX statement to the RSREG SAS code.But for incomplete multi-response situation one can't use this procedure.So to obtain individual optima for incomplete multi-response situation a SAS code has been developed for determining individual optima using the method of ridge analysis for determining optima.The SAS code can be used for specified number of input factors (k ≤ 5) and response variables (p 1 = 2, 3 and p 2 = 2, 3).The SAS code is available with the first author and can be obtained by sending an E-mail to nandi_stat@yahoo.co.in.The method of ridge analysis for determining optimum conditions given in [2] is used for the purpose of obtaining individual optima of the response variables with incomplete observations.

Illustrations
In this section, we illustrate the procedure developed in Sections 2 and 3 for simultaneous optimization of several response variables through the data from a quantitative factorial experiment.
Example 4.1: An experiment was conducted on osmotic dehydration of banana to determine the optimum combinations of power levels, temperature and air velocity at Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, New Delhi.The levels of the 3 factors tried are shown in Table 1.
The data were collected on energy use efficiency (%) ( ) Y .The experimenter is interested in obtaining the optimum combination of the controllable factors that maximizes all the response variables simultaneously.The levels of power (X 1 ), temperature (X 2 ) and air velocity (X 3 ) are coded, using the following expression given in [2] ( ) where x i is the coded variable, iL X and iH X are the low and high levels of i X , respectively.Using the above expression, the coded and measured levels for the factors are shown in Table 2.
The coded levels of the factors and data on 5 response variables in given in Table 3.
As mentioned in Section 1 that in case of complete multi-response situations, the parameter estimates obtained by taking all the response variables together is same as obtained individually for each of the response variables.A second order response surface model was fitted to each of the 5 response variables separately to obtain the parameter estimates.The regression coefficient estimates, their standard errors, and the coefficients of determination 2  R , are given in Table 4.It was observed that stationary points are outside the experimental region for some of the response variables, we have performed ridge analysis to obtain individual maxima.The individual maxima for the 5 response Table 1.Factors and levels.In case of incomplete multi-response experiments the parameter estimates obtained by taking each response variable individually is not same as the parameter estimates obtained taking all the response together.In this case to estimate the parameters of the second order response surface are obtained as per procedure of Case I in Section 2.1.After obtaining individual estimates based on the available data, residuals are used to obtain variance-covariance matrix.Using the variance-covariance matrix, we have obtained the GLS estimate of the parameters of the second order response surface model.Comparing Table 4 and Table 7 it can be observed that, parameter estimates of the response variables with incomplete data on some design points are different.
It was observed that the stationary points are outside the experimental region for some of the response variables, we have performed ridge analysis to obtain individual maxima.function obtained is 3.36 which is lower (3.81)than that obtained from complete multi-response situation as given in Table 9.
On the similar lines, one can obtain the simultaneous optima for Case II of incomplete multi-response situations.

Conclusion
In the present investigation, a methodology was developed for estimation of parameters and optimization of simultaneous optimization from multi-response experiments where some responses could not be obtained on some design points.The procedure of obtaining a consistent estimator of unknown dispersion matrix in case of incomplete multi-responses is also suggested.The simultaneous optimization procedure is implemented through SAS codes for specified number of input factors (k ≤ 5) and response variables (p 1 = 2, 3 and p 2 = 2, 3).The procedures developed have been illustrated through an example.Further efforts are required to obtain a general procedure for any number of factors and any number of response variables in incomplete multi-response situations.

units a subset of p p 2 <
responses is recorded and (ii) on n 2 units all the responses (p) are recorded, on n 1 units a subset of p p

Figure 1 .
Figure 1.Lengths of bars represent number of observation recorded for that response.X j : (n 1 + n 2 ) × m matrix of rank m of known functions of the settings of the coded variables,

Figure 2 .
Figure 2. Lengths of bars represent number of observations recorded for that response.

Table 2 .
Coded factors and levels.

Table 3 .
Design points in coded levels of input factors and response values.

Table 4 .
Least squares fit and regression estimates.
*The number in the parentheses is the standard error.variablesobtainedbythe method of ridge analysis are given in Table5.Using the distance measure given in (33) simultaneous maxima for the 5 response variables is obtained by minimizing ( )

Table 6 .
The location at which the linear complete multi-response function attains its simultaneous minima is [0.995 (278.95W),0.59(46.55˚C), −0.52 (0.98 m/s)] and the corresponding maxima (41.02, 2.55, 71.22, 54.51, 643.16) for the response variables.The minimum value of the distance function obtained is 3.81.Now, to illustrate the procedure for incomplete multi-response situations, we have deleted the data on some of the response variables on some of the design points in Example 4.1.Example 4.2 relates to Case I of incomplete multi-response situation as discussed in Section 2.1.For the purpose of illustration, consider the experiment in Example 4.1 for complete multi-response experiments.Let us consider that the data on Energy use efficiency (%), Rehydration ratio and TSS for the design points (1, 1, 0) and (1, 1, 1) at serial number 35 and 36 in Table3are deleted.

Table 8
includes the individual maxima for the five response variables obtained by the method of ridge analysis.Using the distance measure given in (33)

Table 9 .
From the results on individual maxima it can be observed that, individual maxima obtained from the incomplete data on three response variables on some design points are different from that obtained with complete multi-response data.The location at which the linear incomplete multi-response function attains its simultaneous optima is [1.000 (280 W), 0.610 (57.2˚C), −0.470 (1.03 m/s)] and the corresponding maxima (40.70, 2.57, 71.42, 54.55, 643.76) for the response variables are given in

Table 9 .
It can also be observed that the minimum value of the distance

Table 5 .
The individual Maxima.

Table 6 .
Simultaneous optimum combination of input factors.
* Actual value of optimum point is given in parenthesis.

Table 7 .
The Generalized least squares fit and regression estimates.

Table 8 .
The individual maxima.

Table 9 .
Simultaneous optimum combination of input factors.
* Actual value of optimum point is given in parenthesis.