A Practical Solution to the Small Sample Size Bias and Uncertainty Problems of Model Selection Criteria in Two-Input Process Multiple Response Surface Methodology Datasets

Multiple response surface methodology (MRSM) most often involves the analysis of small sample size datasets which have associated inherent statistical modeling problems. Firstly, classical model selection criteria in use are very inefficient with small sample size datasets. Secondly, classical model selection criteria have an acknowledged selection uncertainty problem. Finally, there is a credibility problem associated with modeling small sample sizes of the order of most MRSM datasets. This work focuses on determination of a solution to these identified problems. The small sample model selection uncertainty problem is analysed using sixteen model selection criteria and a typical two-input MRSM dataset. Selection of candidate models, for the responses in consideration, is done based on response surface conformity to expectation to deliberately avoid selection of models using the problematic classical model selection criteria. A set of permutations of combinations of response models with conforming response surfaces is determined. Each combination is optimised and results are obtained using overlaying of data matrices. The permutation of results is then averaged to obtain credible results. Thus, a transparent multiple model approach is used to obtain the solution which gives some credibility to the small sample size results of the typical MRSM dataset. The conclusion is that, for a two-input process MRSM problem, conformity of response surfaces can be effectively used to select candidate models and thus the use of the problematic model selection criteria is avoidable. How to cite this paper: Pavolo, D. and Chikobvu, D. (2019) A Practical Solution to the Small Sample Size Bias and Uncertainty Problems of Model Selection Criteria in Two-Input Process Multiple Response Surface Methodology Datasets. Open Journal of Statistics, 9, 109-142. https://doi.org/10.4236/ojs.2019.91010 Received: December 7, 2018 Accepted: February 22, 2019 Published: February 25, 2019 Copyright © 2019 by author(s) and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access D. Pavolo, D. Chikobvu DOI: 10.4236/ojs.2019.91010 110 Open Journal of Statistics


Introduction and Literature Review
Multiple response surface methodology (MRSM) is when an industrial process with more than one response variable is investigated and studied through the analysis of reliably generated response models and corresponding response surfaces at some region of operability.Processes with two inputs provide a special case of MRSM problems whose response surfaces can be constructed in three dimensions and therefore can be analysed for conformity.
Hill & Hunter [1] are credited with originally identifying the existence of MRSM in their review paper in response surface methodology.In their review of developments in response surface methodology research from 1966 to 1988, Myers et al. [2], conclude that most problems encountered in literature and practice are in fact MRSM problems as opposed to single response models (univariate case).Khuri [3] single handedly wrote a full review of MRSM.Mukhopadhyay & Khuri [2] and Khuri [4] afforded sections of MRSM in their latest response surface methodology reviews.
Myers et al. [5] emphasise that MRSM should always include canonical and response surface analysis before optimisation.Khuri [3] [4] clearly argues that traditional response surface techniques that apply to single response models, in general, are not adequate to analyse multiple response problems.Khuri [3] and Mukhopadhyay & Khuri [2] emphasize the use of multivariate statistical methods in every stage of MRSM processes so that the responses are considered simultaneously in every aspect, especially where correlation exists between the responses.Figure 1 shows the MRSM contextual framework as developed from literature.
The first of three problems with industrial MRSM work is the uncertainty inherent in the process of model selection (Wit et al., [7], Steel [8], Moral-Benito [9]) and this worsens when sample sizes are small.Hjort and Claesken [10] state that the uncertainty associated with the model selection process can make the inference based on the final model to be seriously misleading.Danilove and Magnus [11] add that standard errors that are estimated under such circumstances are known to underreport variability.This problem can be solved by avoiding the use of model selection criteria in the selection of best models from candidate models.
Figure 1.Multiple response surface methodology contextual framework.
The second problem concerns the small sample size bias problem of classical model selection criteria.MRSM is largely a regression modeling and model selection problem.Model selection is done through model selection criteria.Seghouane and Bekara [12] state that the derivations of most model selection criteria like Mallow's Cp [13], Akaike's AIC [14] and Schwarz's SBC [15] rely on asymptotic approximations which are not valid for small sample sizes.Most MRSM experimental datasets fall within the small sample size context.The use of such criteria for the small sample size context problems results in increased bias.
Attempts have been made to correct some of the classical model selection criteria for the small sample size context.Suguira [16] corrected the Akaike's information criterion (AIC) for small samples by making the assumption of a finite true model and avoiding the asymptotic argument in his derivation of the corrected AIC (AICc).Hurvich and Tsai [17], after some simulation tests, confirmed that AICc achieves dramatic reductions in bias in small sample sizes compared to AIC and even Schwarz's SBC.They recommend its use in place of AIC for small sample size problems.McQuarrie and Tsai [18] corrected Hannan and Quinn [19]'s consistent criterion for small sample sizes to come up with the corrected HQ (HQc).Sawa [20] also did a small sample correction to Schwarz's SBC to come out with BIC.Seghouane and Bekara [12] did a small sample correction to their Kullback-Leibler symmetric divergence information criterion (KIC) to come up with corrected KIC (KICc).Literature is quite silent in the application of such research findings in MRSM.Both Myers et al. [5] and Khuri [4] do complain on the lake of uptake of research findings by MRSM practitioners.The extent to which the small sample size bias problem has been solved by the small sample size correction efforts can be analysed using an MRSM dataset.In 1999 Pan [21] released his proposal of using bootstrapping for model selection with small samples.There are also proposals to look into model averaging (Yuang and Yang [22], Xie [23]).
In accordance with the information-theoretic approach, a "best model" for analysis of data depends on sample size as smaller effects can often only be revealed as sample sizes increase.The amount of information in large datasets greatly exceeds the information availed by small datasets.
The small sample size bias problem and the one of model selection uncertainty are related in that they all have to do with the use of model selection criteria in choosing the best model.If the use of model selection criteria is avoided for the use of an approach with more certainty, both problems will be solved.
The third problem with industrial MRSM work is that most studies fall within small data analytics since they are based on analysing response models generated from datasets emanating from running designed experiments.Such experiments are designed to be cost effective and at the same time are expected to provide optimum information.It is true that most MRSM studies in industrial work use experimental designs which are within the small sample size context, sometimes below the order of (10 + k), to minimize cost.For example, most of the MRSM DOI: 10.4236/ojs.2019.91010112 Open Journal of Statistics industrial examples used in Myers et al. [5] and [24] do not have experimental runs (n) > k * 40.This critical problem has not been dealt with in MRSM research studies so far.Khuri [4] does not even mention this problem in his proposals of areas of future research in MRSM.This problem is dealt with by using a solution methodology that has more rigour and transparency.The remainder of this paper is organized as follows.Section 2 introduces the typical small sample MRSM dataset, which will be subjected to analysis in this study.Section 3 analyses the typical small sample MRSM dataset with sixteen model selection criteria taken from literature exposing the selection uncertainty problem in MRSM small sample sizes.In Section 4, the MRSM small sample size problem is resolved by a multi-model approach and the results are analysed.Section 5 looks at the validation of the results in line with the original problem of obtaining cure times of rubber covered mining conveyor belts for a Southern African manufacturer.Section 6 discusses findings and Section 7 concludes and proposes direction of future research.

The Dataset
The MRSM experimental design and results used for the current study are shown in Table 1.The experimental design is a two-factor central composite design [25] [26].The experimental runs were done to determine the best cure times for different industrial and mining conveyor belt thicknesses for a Southern African rubber covered mining conveyor belts manufacturing company.The two input parameters into the belt curing process are cure time (T) and rubber thickness (RT).The measured quality responses are adhesion of belt components (in N/mm) and hardness of cured rubber cover (in Shore A).
This dataset is chosen for use in this investigation because, in addition to being a typical small sample MRSM dataset, two factor experiments produce response models that have response surfaces that can be analysed.Where factors are more than two, it is difficult to construct response surfaces of models in three dimensions.So two factor processes are a special case of MRSM where it is possible to check response model fitness to data, prediction performance and analyse response surface model conformity to expectations before selection of "best" model for consideration in multi-objective optimisation.More complex problems require more complex ways to solve them.

All Possible Regressions Modeling
All possible regressions modeling is applied to the dataset in Table 1 to obtain thirty one (2 p−1 , where p is the total number of regressors) models for each of the two responses, that is, adhesion and hardness.ANNEXURE A shows the thirty one models of each response in detail.

The Model Selection Uncertainty
This section analyses the problem of uncertainty that characterises the model selection criteria of response models.The multi-selection criteria approach is

 Hardness
The model selection uncertainty problem is again evident with hardness as shown in Table 3. Table 4 summarises the results of the analysis of response surfaces.
From Table 4, the adhesion models with the conforming response surfaces are not necessarily the ones selected for best fit to data or for best prediction.There is no single model that has best fit, best prediction as well as conforming Table 3. Showing selection criteria and the selected hardness best model.
Table 4. Summarising model selection results split between fit to data, prediction and conforming Response Surface (RS).
Adhesion Table Hardness Table Open Journal of Statistics response surface at the same time.Best fit and/or best prediction does not ensure conforming response surface.This is the second uncertainty problem.

Model Selection Uncertainty 3: Good fitness to data does not necessarily imply good prediction performance
The other uncertainty problem is that a model with good fitness to data is not necessarily good at prediction performance.This is evident from the table.This is the third model selection uncertainty problem.
Model Selection Uncertainty 4: The uncertainty of positioning/ranking by the individual criteria Figure 4 shows how each of the four adhesion models with conforming response surfaces is ranked by each selection criterion.The only hardness model with a conforming response surface that is selected as best fitting is the full model which is selected by R 2 only.
In addition to the uncertainty of selection as "best" model, as previously noted, there is also the uncertainty of positioning/ranking by the individual criteria.One cannot predict the positioning/ranking of models by the individual criteria.
The best selection position of the only hardness model with the conforming response surface is number three as seen in Figure 5. Seven model selection

Effect of Correction of Model Selection Criteria
Twenty eight model selection criteria are used to select the best model and the frequency of selection per number of regressor variables (p) is determined.This is done for each of the response variables of adhesion and hardness.

Adhesion
Table 5 is for the adhesion response.Row 2 of Table 5, titled "ALL 28", summarises the findings.Row 3, titled "Corrected", has the frequency details of the ten small sample size corrected criteria.
A visual picture of Table 6 is shown by Figure 6.
The mere fact that there is a selection for each p-value when all the twenty eight model selection criteria are used indicates model selection uncertainty.
Table 6 shows that when only the ten small sample size corrected criteria are considered, two facts emerge: 1) there is zero selections for the three regressor

Hardness
Table 7 is for the hardness response.Row 2 of Table 7, titled "ALL 28", summarises the findings.Row 3, titled "Corrected", has the frequency details of the ten small sample size corrected criteria.
Table 7 gives the following line graph shown in Figure 7.
Uncertainty in selection is obvious in both the adhesion and hardness cases where twenty eight model selection criteria are used and when only small sample size corrected criteria are used.This could be posing a query to the achievement of parsimony in balancing bias (lack of fit) and penalty especially where small sample size corrected criteria is concerned as there could be over-correction which results in criteria selecting models with fewer regressors or underfitting in trying to correct small sample size inefficiency.

Optimisation
MRSM is based on the use of response surfaces of a selected combination of response models to simultaneously determine the region with the desired results.
There being no certain method of using a classical model selection criterion to predict models with conforming response surfaces implies the simplest way to deal with the problems of model selection uncertainty and small sample size bias is to avoid choosing candidate "best" models with model selection criteria.In two-input MRSM processes this is possible as response surfaces can be used to select candidate models.Therefore, in this section the four possible results from the permutations of the four adhesion response surfaces and one hardness response surface are analysed.The four results are obtained by simultaneously solving the tabled four pairs of models.The mean square errors (MSE) of the conforming models are also determined.

The Permutations of Models with Conforming Response Surfaces
The permutations of adhesion-hardness response model pairs with conforming response surfaces is shown summarized in Table 9.The four pairs form the set of candidate pairs for optimisation or determination of desired results.

The Determination of Desired Results
The desired results are obtained by constructing data matrices for each response The response surfaces and corresponding data matrices of Pair 2 are presented as an example.The data matrices are overlaid to obtain the desired region from which results of cure time per rubber thickness are obtained.The boxed figure 9 in the data matrix in   Adhesion [T, RT, T*RT, T 2 ] Data Matrix The data matrix for the adhesion model [T, RT, T*RT, T 2 ] if overlaid with a hardness model data matrix from a conforming response surface gives the desired region with optimum cure time per rubber thickness.

 Hardness [T, RT, T*RT, T 2 ] Data Matrix
The hardness data matrix of Table 11 for model [T, RT, T*RT, T 2 ] is overlaid with the data matrix for the adhesion model [T, RT, T*RT, T 2 ] to give the desired region with cure time per rubber thickness meeting customer expectations.Table 12 shows the region in red in which both the adhesion and hardness are within the customer specified region of adhesion ≥ 12 N/mm and hardness ≥ 60 ˚Shore A.
Cure times per belt rubber thickness are selected to ensure customer expectations are met and right levels of productivity maintained.The region in red has both adhesion and hardness results in levels acceptable to the customer expectations.The boxed figures in the desired region indicate the cure time per belt rubber thickness combinations considered for work instructions.

MSE's of Conforming Response Models
Mean Sum of Squares (MSE's) for the models with conforming response surfaces are determined using Equation (9) for comparing of accuracy of response models.The formula for MSE is given below for a sample size n.
( ) where i Y is the measured response, ˆi Y is the predicted response.

Results
This section shows the results obtained by optimizing each of the four pairs with the methodology outlined in Section 5.The computed MSE results of the individual models with conforming response surfaces are shown.

Tables of Rubber Thickness vs. Cure Time
Using overlaying of data matrices, tables of rubber thickness-cure time combinations are determined for the four pairs of models.The tables are shown as Tables 13-16.
If all the result tables are averaged, a result equivalent to Table 14 and Table 15 is obtained.This result is both the median and mode of the tables and is therefore the best to adopt for use.

MSE's of Models with Conforming Response Surfaces
Response model accuracy is checked by the size of the MSE.

Validation
The results from the first seven conveyor belts to be run from the process, shown in Table 18, are compared with model forecasted results by a paired T-Test and the mean forecasted squared errors (MSFE) of response models are computed.

Validation by T-Tests
The first validation test proves that there is no significant statistical difference between the forecasted results and the results obtained from belts run from the normal belt production process.The results from the first seven conveyor belts to be run from the process are compared from model forecasted results using a paired T-Test as shown in Figure 8.

Validation by MSFE
In this section, validation of results is done by checking the effectiveness to obtain better or the same rubber thickness-cure time combinations as obtained in the first MRSM experiment.
( ) where i Y is i th the measured response, Fi Y is the i th forecasted response.
The MSFE values for adhesion models shown in Table 19 reveal that the full model, [T, RT, T*RT, T 2 , RT 2 ], has the best prediction accuracy compared to the   Averaging all the results tables gives a table similar to Table 14 and Table 15.
The table represented by Table 14 and Table 15 is therefore both the mean and mode of the permutation of results.This shows that it is a robust result.This result would be the best to use in Work Instructions in this case.

Discussion
This section discusses the small sample size MRSM datasets problems of using classical model selection criteria to select "best" models for use in determining desired solutions and how they are solved.

The Small Sample Size Bias Problem
The small sample size model selection criteria bias problem is solved by avoiding the use of model selection criteria for selecting "best" models in this study.In fact where response surfaces can be used to select candidate models, classical model selection criteria become irrelevant with all their problems.

The Model Selection Uncertainty Problem
The The use of response surfaces to select conforming models is very possible with two-factor multiple regression models although it is not that easy with processes of more than two factors.However, whilst still dealing with two factor processes, selecting models based on conformity of response surfaces avoids use of classical model selection criteria in choosing the best models and hence the problems of model selection criteria uncertainty and small sample size datasets bias.In MRSM that is very important.Using response models with conforming response surfaces within the region of interest is the fundamental focus of MRSM.The selection of candidate models with conforming response surfaces within the region of interest also opens the door for multiple model solution methodology which introduces rigour, transparency and therefore credibility into the MRSM results.The focus of MRSM, therefore, shifts from obtaining the "best" model to obtaining the best results within the region of interest.

Conclusion and Future Research
Selecting candidate response models based on conformity of response surfaces avoids the uncertainty and small sample size bias problems that are related to using classical model selection criteria in selecting best models.So in two-input process problems, where response surfaces can easily be constructed and analysed, it is better for practitioners to use response surfaces to select candidate models for determining the permutations of response model sets for onward simultaneous optimisation.
Multiple model MRSM approach ensures credibility as rigour is maintained up to the final result.The problem of model selection uncertainty is kept clear of affecting the final result in a very transparent way.
However, for best results, the proposed multiple model MRSM approach based on using candidate models with conforming response surfaces requires prior knowledge on the expected ideal response surface.
Future Research 1) More datasets from two-factor processes need to be studied to ensure generalizability of findings to all other two-factor processes.
2) A usability study of the multivariate approach (Fujikoshi and Satoh [8]) will be done to investigate applicability, simplification, effectiveness and accuracy.
4) Applicability of envelope models and methods to MRSM.

5)
Research on generalising beyond the two-factor process to three and higher factor processes is necessary.

Annexure A
In this section, the MRSM dataset is modeled to produce adhesion and hardness response models studied in this univariate investigation.Response modeling is done using Minitab 17.The model selection uncertainty of this MRSM small sample dataset is analysed.The general multivariate multiple regression model is , where u X is the vector of settings of k design variables at the u th experimental run, β is a vector of unknown parameters, i f is a function of known form for the i th responses, and ui ε is a random error associated with the i th response for the experimental run u.It is assumed that the ui ε 's are normally distributed such that ( ) Var ε σ = .
There are two important mathematical models that are commonly used in multi-response surface methodology which are special cases of model (1).
The first-degree model (d = 1), And, the second-degree model (d = 2) In this study, y is either adhesion or hardness, and i x is cure time (T) or to- tal rubber thickness (RT).The parameters (β's), are estimated using statistical software.
All regression methodology is employed to produce thirty one response models for each of the two responses, that is, adhesion and hardness from the MRSM dataset.
The general second degree model of Equation ( 3) is expanded into the following belt curing model: where T is cure time in minutes, RT is rubber thickness in millimeters, 0 β is the intercept and 1 β , 1 β , 12 β , 11 β , and 22 β are estimates of parameters.

1) Adhesion Response Models
Table A1 shows the summary information of the thirty one all possible regression models generated by Minitab 17 for the adhesion response.Each response model is shown with its regressors and parameter values.For example, the first and second models which are represented by T and RT expand to: The rest of the twenty nine models are expanded in a similar manner.Using the classical model selection process anyone of these thirty one models shown in Table A1 is a potential "best" model as chosen by the selection criterion in use.In this study, this set of models is subjected to multiple criteria analysis to avoid the risks of the one criterion model selection approach.D1) The data matrix for the adhesion model [T, RT, T*RT] if overlapped with a hardness model data matrix from a conforming response surface will give a wanted region with optimum cure time per rubber thickness results.

Figure 2 .
Figure 2. Showing the small sample size MRSM problems.
One model, [T, RT, T*RT, T 2 , RT 2 ], is selected by eleven different criteria whilst another two are selected for the remaining five times.Even the ten small sample size corrected criteria do not agree with five choosing the full model, three choosing [T, T 2 ] and two [T, T 2 , RT 2 ].This uncertainty is characteristic of the model selection process which selects one model as best.Model Selection Uncertainty 2: Good fitness to data and/or good prediction performance does not imply conforming response surface  Analysis of Response Surfaces Response surfaces for each of the thirty one models are obtained and analysed as to whether they conform to expectations.Data matrices are constructed to confirm.Only four adhesion response models have conforming response surfaces which are: [T, RT, T*RT]; [T, RT, T*RT, T 2 ]; [T, RT, T*RT, RT 2 ], and [T, RT, T*RT, T 2 , RT 2 ].Only one hardness model has a conforming response surface which is [T, RT, T*RT, T 2 ].The response surfaces are shown in ANNEXURE C.

Figure 4 .
Figure 4. Showing how the 4 adhesion models with conforming responses surfaces are ranked by different criteria.

Figure 5 .
Figure 5. Showing how the hardness model with conforming response surface is positioned by each criterion.

Figure 6 .
Figure 6.Showing the model selection results per number of regressors (p) for the adhesion response.
position and the one position, and 2) there are seven selections to the left of three regressors (median term) and only three to the right.

Figure 7 .
Figure 7. Showing the model selection results per number of regressors (p) for the adhesion response.
in a candidate pair using Microsoft Excel and overlaying them to determine the region where the two response models simultaneously achieve customer expected results of adhesion ≥ 12 N/mm and hardness ≥ 60 ˚Shore A. The detailed process of determination of desired results for each combination is shown in ANNEXURE D. Pair 2 is used here as an example to demonstrate the process.Pair 2: [T, RT, T*RT, T 2 ] vs. [T, RT, T*RT, T 2 ]

Figure 8 .
Figure 8. Showing the paired T-Test results.
model selection uncertainty problem is related to the use of classical model selection criteria for selecting best models.Even the use of small sample size corrected criteria is shown to have the problem of model selection uncertainty.It is difficult to predict a model with a conforming response surface with classical best model selection criteria whether small sample size corrected or not.MRSM depends a lot on the use of conforming response surfaces in simultaneously determining the desired results.Research on design of experiments for MRSM should focus more on determining conforming response surfaces than just models with good fit to available data or prediction.The conformance of the response surface within the region of interest should be the focus of MRSM experimental designing research.Using Response Surfaces to Select Response Models According to Moral-Benito [9], standard practice in empirical research is based on two steps: 1) researchers select a model from the space of all possible models and 2) proceed as if the selected model had generated the data.Uncertainty is, therefore, typically ignored in the model selection step.DOI: 10.4236/ojs.2019.91010126 Open Journal of Statistics

Figure C3 .
Figure C3.Showing the response surface of the adhesion model [T, RT, T*RT, RT 2 ].

Figure C5 . 1 )
Figure C5.Showing the hardness response surface from the single conforming response surface model.

Table 1 .
Experimental design and averaged results from the MRSM experiment.
model selection criteria chose five different response models as "best".The mere dispersion of the "best" selections all over Table2shows that there is disagreement on what is best.This is further buttressed by the fact that five models are selected as "best".Model selection using a single criterion would not have highlighted the selection uncertainty problem in this way as it is clearly evident in a multi-criteria selection scenario.

Table 5 .
Summarising model selection results per number of regressors (p) for the adhesion response.

Table 6 .
Showing model selection results for pooled regressors before and after the median for the adhesion response.

Table 8
again queries the principle of parsimony as the small sample size corrected criteria select no model with three regressors (median term) but select four models to the left hand and six models to the right hand of the median term.It appears the major achievement of small sample size correction is the zeroing of the middle term.

Table 7 .
Summarising model selection results per number of regressors (p) for the hardness response.

Table 8 .
Showing model selection results for pooled regressors before and after the median for the hardness response.

Table 9 .
Table 10 is obtained by setting a time of 22 minutes for a belt with rubber thickness 20 mm.Showing the four pairs of four adhesion models and the one hardness model with conforming response surfaces.

Table 10 .
Showing the data matrix of the adhesion model [T, RT, T*RT, T 2 ].

Table 11 .
Showing the data matrix of the hardness model [T, RT, T*RT, T 2 ].

Table 12 .
Showing the region of overlay with the wanted results.

Table 17 .
Showing MSE results for adhesion and hardness.

Table 19 .
Showing MSFE results for adhesion and hardness.

Table A1 .
Summary of adhesion models.

Table A2 .
Summary of hardness models.

Table D1 .
Showing the data matrix of the adhesion model [T, RT, T*RT].

Table D2 .
Showing the data matrix of the hardness model [T, RT, T*RT, T 2 ].

Table D4 .
Summarising model selection results split between fit to data, prediction and conforming response surface.Hardness [T, RT, T*RT, T 2 ] (Table D5 and Table D6) Table D5.Showing the data matrix of the hardness model [T, RT, T*RT, T 2 ].Hardness [T, RT, T*RT, T 2 ] The hardness response surface is shown already in Figure D2 above.b)DataMatricesi)Adhesion [T, RT, T*RT, RT 2 ] (TableD7) TableD7.Summarising model selection results split between fit to data, prediction and conforming response surface.

Table D8 .
Showing the data matrix of the hardness model [T, RT, T*RT, T 2 ].

Table D10 .
Showing the data matrix of the adhesion model [T, RT, T*RT, T 2 , RT 2 ].

Table D11 .
Showing the data matrix of the hardness model [T, RT, T*RT, T 2 ].