Bayesian Variable Selection for Mixture Process Variable Design Experiment ()
1. Introduction
Bayesian methods are important approaches due to their ability to quantify uncertainty. In such an approach, prior distributions that represent subjective beliefs about parameters are assigned to the regression coefficients. By applying Bayes’ rule, prior beliefs are updated by the data and transformed into posterior distributions, on which all inference is based.
An important task in regression building is to determine which variables should be included in the model. Therefore, the principle of Bayesian variable selection is to get the large and the small effects distinctive, and effective prior essences mass around zero and distribute the remaining over the parameter space. Such a prior represents the fact that there are small coefficients close to zero on one hand and larger coefficients on the other hand. These priors can be built as a combination of two distributions, along with a narrow normal continuous distribution centred at zero with a small variance called a “spike”, and the other with a flat normal continuous distribution with a large variance to spread over a wider range of parameter values called a “slab”. This type of priors is called “Spike-and-Slab” priors (see Figure 1). Practically saying, those priors are beneficial for purposes of variable selection because they permit the classification of the regression coefficients into two groups: one group with large, significant effects, and the other group with small, negligible effects.
As reviewed by [1] [2] introduced Bayesian variable selection via “spike-and-slab” prior distributions. The spike prior that they used was a probability mass at zero to remove the non-significant variables. Their slab is the uniform distribution with a large symmetric range in order to keep the significant variables. Following their work, many priors were proposed to implement the spike-and-slab property. [3] proposed the Stochastic Search Variable Selection (SSVS) in which the coefficients are sampled from a mixture of two normal distributions with different variances. The spike part is the distribution with a small variance while the slab
![]()
Figure 1. Gaussian mixture prior for
.
part is the distribution with a much larger variance. Also, [4] proposed positive mass at zero for the spike part and a normal distribution for the slab part. In addition, [5] proposed a Bayesian approach for model selection in fractionated split-plot experiments with application to “robust-parameter design”. In their work, they extend the SSVS algorithm of [6] to account for the split-plot error structure. They derive an expression for the posterior probability of a model that requires the computation of, at most, two unidimensional integrals, and employ this quantity for model selection. They were able to integrate the coefficients and the variance components from the joint posterior distribution of all parameters because they use the conjugate normal-inverse gamma prior for these parameters. The integrals are computed with Gaussian quadrature, and Global and Local search algorithms to find models with high posterior probabilities.
The above review summarised the former research and analyzed the problems in the Bayesian variable selection field. On the other side, we notice that the frequentist methods analysis of this kind of experimental design would fail in high Type I error rate [7]. The difficulty of using the frequentist analysis to estimate the variance components is explained in [8]. To overcome this issue, we apply the Bayesian variable selection as this method introduces prior distribution about the variance component in the linear mixed models.
The novel contribution of this paper to Bayesian variable selection is motivated by a very specific experimental design of data from experiments subject to restricted randomisation. We have two different levels of the experimental units one for the whole plots and the other for the subplots in the split-plot design; see, for example, [9]. To address this issue, we adapt the SSVS algorithm in which we sample the subplot coefficients using a mixture of normal posterior distributions with a slab variance different from the slab variance which will be used in the mixture of normal posterior distributions for the whole-plot coefficients. This method reduces Type I and II error rates as well as reduces the prediction error for split-plot design rather than applying the SSVS algorithm in which all coefficients will be sampled from a mixture of normal posterior distributions with one slab variance. We called the approached method the Stochastic Search Variable Selection for Split-Plot Design (SSVS-SPD). The frequentist analysis is dependent on the estimates of the variance components, yet these estimates cannot be precisely calculated because of the deficiency of the degrees of freedom for the random effects in the split-plot design. This issue was discussed by [8]. Introducing a prior distribution for the variance components in the linear mixed model provides additional information to overcome the problem of variance estimation.
This paper differs from [10] in which the dataset used in this paper is the Vinyl Thickness experiment, which had not been used in variable selection. Also, it differs in the choice of the shape parameter of the correlation parameter as in this paper we used
to fit this data. A preprint has previously been published [11].
2. Split-Plot Design and Sample Model
The model for the block experiments includes two types of errors: block error and residual error. Hence, linear mixed models (LMMs) are used to analyse responses from the blocked experiments.
Linear mixed-effects models (LMMs) introduce correlations between observations using random effects. This leads to the use of generalised least squares (GLS) estimation, combined with restricted maximum likelihood estimation (REML) of the variance components as will be discussed. This type of analysis is used by the most design of experiments textbooks that deal with blocked designs. In matrix notation, the model corresponding to a blocked design is written as
(1)
where Y is
vector of observations on the response of interest, X is the
model design matrix containing the polynomial expansions of the m factor levels at the n experimental runs,
is the
vector of unknown fixed parameters, Z is an
random design matrix which represents the allocation of the runs to blocks, and whose
th element is one where the ith observation belongs to the jth blocks, and zero otherwise. If the runs of the experiment are grouped per block, then Z is of the form
(2)
where
is a k vector of ones, and
are the blocks sizes. The random effects of the b blocks are contained within the
vector
, and the random errors are contained within the
vector
. It is assumed that
and
are independent and normally distributed, i.e.
, where
is the
matrix of zeros. Hence,
, and
, where
and
are the b and n column vectors of zeros respectively, and
and
are the b-dimensional and n-dimensional identity matrices respectively.
Under these assumptions, Y is a normally distributed random variable with mean
, and the variance-covariance matrix of the response Y can be written as
(3)
(4)
(5)
(6)
V can be given as a block diagonal,
where
and
As a result, the variance-covariance matrix
of all observations within one block is compound symmetric: the main diagonal of the matrix contains the variances of the observations, while the off-diagonal elements are covariances. However,
can be rewritten as
(7)
(8)
where
is a measure for the extent to which observations within the same block are correlated. The larger this variance ratio, the stronger observations within the same block are correlated.
When the random error terms as well as the group effects are normally distributed, the maximum likelihood estimate of the unknown model parameter
in Equation (1) is the generalised least squares (GLS) estimate. Detecting the estimator
of
, requires to minimise
(9)
with respect to
, which is tantamount to detecting
, so that
(10)
Therefore, the generalised least squares (GLS) estimator of
is
(11)
and the variance-covariance matrix of the estimators is given by
(12)
Often, the variances
and
are not known and therefore, Equation (11) and Equation (12) cannot be used directly. Instead, the estimates of the variance components,
and
, are substituted in the GLS estimator as in Equation (11), yielding
(13)
where
(14)
In that case, the variance-covariance matrix in Equation (12) can be approximated by
(15)
The generalised least square (GLS) estimator is unbiased, meaning that
, and is equal to the maximum likelihood estimator (MLE). The likelihood function defined as it is the joint probability density function for the observed data examined as a function of the parameters. Hence, the likelihood function for Y in Equation (1) is
(16)
where
is a constant which does not depend on
. The maximum likelihood estimator (MLE) is the estimator that maximises the likelihood function, which is tantamount to detecting the
as
(17)
which is equal to
(18)
where
is the log likelihood function. As Equation (9) is proportionate to log of Equation (16), the GLS estimator in Equation (11) is the result of Equation (17) and Equation (18).
The restricted maximum likelihood (REML) used to estimate
and
is
(19)
where
is defined in Equation (13). The restricted log-likelihood
is minimised with respect to the variance components
and
to obtain an unbiased estimate for the variance components.
3. The Bayesian Methodology
3.1. Bayesian Framework
The essential philosophy behind Bayesian inference is to update a prior distribution for an unidentified parameter to a posterior distribution by Bayes’ theorem. Bayes’ theorem can be used to estimate the conditional distributions. While the frequentist approach treats the parameters as unknown and fixed, the Bayesian approach regards them as random variables. We can define the prior distribution
as the probability density (or mass) function which reflects our beliefs about
in the parameter space
. For given data
, the likelihood function
can then be defined given the parameter
for the data y. Also, we can define the posterior density (or mass) function
which represents our updated belief about
given the observed data y.
Using Bayes’ theorem, the posterior density of
given y is:
(20)
Bayesian inference continues from this distribution. The denominator of Equation (20) is the marginal likelihood of y, and it often does not need to be calculated because it is independent of
. Bayes’ rule can then be written as:
(21)
Equation (21) defines the unnormalised posterior density. The posterior then is proportional to the likelihood × the prior. For more details on Bayesian inference, see [12] and [13].
A prior distribution can be selected based on past information or experimental practice. It can be informative or uninformative. The informative distribution is given numerical information to estimate the parameter of concern. The uninformative reflects equilibrium among outcomes when weak information about the parameter is presented. There are two types of uninformative priors: proper prior and improper prior. The density for proper prior distribution integrates to 1 whereas the integral of the density for an improper distribution is not finite. If the prior integrates to any positive finite value, it is called an unnormalised density and can be renormalised-multiplied by a constant to integrate to 1 [12] [13].
3.2. Markov Chain Monte Carlo (MCMC) Methods
Markov Chain Monte Carlo simulation is a general method based on drawing values of the
from approximate distributions, and then correcting those draws to better approximate the target posterior distribution
[12] [13] [14]. A Markov chain can be defined as a sequence of random variables
for which for any iteration t, the distribution of
depends only on the most recent value
[12] [13] [14]. A Markov chain is generated by sampling
. Thisp is called the transition kernel of the Markov chain. Therefore,
depends only on
, not on
.
As
, the sampling from Markov chain converges to the posterior for the right choice of transition kernel
. Thus, we should run the simulation long enough so that the distribution of the current draws is close enough to
.
3.2.1. Metropolis-Hastings Sampling
Metropolis-Hastings sampling was proposed by [15] and [16]. The theory of this sampling is based on rejection sampling. The acceptance-rejection method is a technique of getting samples from a distribution with an unknown form. The Metropolis-Hastings algorithm is a common expression for a family of Markov chain simulation methods. It is worth describing the Metropolis algorithm first, then broadening it to discuss the Metropolis-Hastings algorithm. Let
be the conditional posterior distribution where we want to sample from. Let
be the current parameter value, and let
be the proposal density. The proposal density is much like a conventional transition operator for a Markov chain, the proposal distribution depends only on the previous state in the chain. However, the transition operator for the Metropolis algorithm has a additional step that assesses whether or not the target distribution has a sufficiently large density near the proposed state to warrant accepting the chain.
3.2.2. Gibbs Sampling
A Gibbs sampler is the simplest of the Markov chain simulation algorithms, and it is used to sample from the conditional conjugate models, where we can directly sample from each conditional posterior [12] [13]. It is rare to find all the conditional posteriors in a model in known forms. One may find some conditional posterior distributions that are possible to directly sample from. Furthermore, one may find some of the conditional posteriors that cannot be straightforwardly sampled from. Therefore, the procedure for this issue is to update the parameters one at a time with the Gibbs sampler used where possible, and one-dimensional Metropolis updating where necessary. This process is called the Metropolis-Hastings within Gibbs sampling and will be used in this work.
4. A Hierarchical Mixture Model for Variable Selection
The linear mixed model fitted to data from a split-plot experiment with n responses is
(22)
where y is
vector of random responses,
is the intercept,
is a
vector of ones, X is the
model matrix without the column of the intercept,
is the
vector of fixed effect parameters and V is
where Z is the random effect design matrix. As
, and
, then V can be written as
(23)
We need to find the highest posterior probability of an indicator vector
such that
for
. When
the term is assumed to be active and will be included in the model, and when
the term is assumed to be non-active and will not be included in the model.
Following [6], and [5], we assume that
, where
is the indicator vector, c is the prior variance of the slab distribution, and
is a diagonal matrix with the jth diagonal element
. The parameters
and c will be given prior distributions, and the parameter d is assumed to be a small fixed non-negative number because we want the spike distribution to have a smaller variance than the slab distribution. Formally the prior construction of
is the following:
For every coefficient
, a Bernoulli variable
is defined taking values 1 and 0 with probability of inclusion
, as
and
. Often,
’s are taken as independent Bernoulli (
) random variables, where
. It is common to fix
in the normal mixture, however, we shall deal with
as a parameter to investigate different values of
, and sample it from the Beta distribution as it will be explained in Section 6.3.
5. Prior Distributions
1) Following the prior distributions used by [5], we assume that the prior distribution for the fixed effects is
2) The prior distribution for the total variance is
,
For this work, we used
and
following [5] as this yields the common non-informative prior for
. This prior is improper, however we will sample from the posterior distribution, which should be a proper gamma distribution.
3) The prior distribution for the correlation parameter is
with shape parameters
. We consider
following [8]. According to [8], “A
prior distribution for a correlation parameter can be interpreted as indicating a prior point estimate of
, this prior information being worth
observations”. Our prior was selected to be centred at
and to be worth four observations in each block. For an experiment with
observations, the posterior distribution would give equal weight to the prior and the likelihood [8]. Similar choice of the prior is in [6] as they set
so that the
is symmetric with a mode of 0.5. This has the effect of pulling the posterior mode of
towards 0.5 when the data are scarce. The prior density for
is
4) The prior distribution for the elements of the indicator vector is
,
where
is the prior probability that
is active following [5].
5) The prior distribution for the elements of the probability of inclusion is
,
[5] set
. However, we select
and
such that the prior of
has a mode = 0.25. The choice of
and
results in a prior with a mode = 0.25 and the upper cumulative percentile at 5% equals 0.66. Meaning a 5% chance the observations have a probability density function ≥ 0.66.
6) The prior distribution for the slab variance c is a discrete uniform prior distribution with support points T = {1/4, 9/16, 1, 4, 9, 16, 25} as given by [5]. They found that large values of c tend to favor sparse models with large effects and in this case small effects will be missed. On the other hand, small values of c tend to favor less sparse models. Moreover, very small values of c tend to favor sparse models again. They select the support points in T such that it covers small and large values of c. The prior distribution for c is
6. Full Conditional Distributions
We use the prior distributions presented in Section 5 to derive the full conditional distributions. The likelihood of the data depends on
, so we can derive the conditional distribution for
using the prior distribution
and the likelihood for the model 22
Note that we standardise both X and y so the fixed effect vector
does not include the intercept. The conditional distribution for
can be expressed as:
The key to deriving the joint posterior distribution is to rewrite the expression in the exponential part in a more convenient form. This can happen by using the multivariate completion of squares:
where A is a symmetric positive definite (hence invertible) matrix. We assume
,
, and
.
The conditional distribution for
can be written as:
Thus, we can sample
from the conditional posterior
, where
(24)
6.1. The Conditional Distribution for ρ
The likelihood of the data depends on
, so the conditional distribution for
can be derived by
We note here that the likelihood depends on
through
as in (4.6), so we can express
as a function of
,
The conditional distribution for
is a non-standard distribution that cannot be sampled directly. Therefore, we use the Metropolis-Hastings (M-H) rejection sampling. Our correlation parameter is
, and has a prior
. We apply the Random-Walk Metropolis-Hastings algorithm, and select a proposal distribution of log-normal distribution for the variance ratio
where
with a mean equal to the current value of
at iteration t and variance
. The choice of
affects the jumping rule in the random walk proposal distribution. As we have one parameter to be updated in the random walk algorithm which is
, we follow [12] and [17] to set
. The most efficient jump has a scale
where h is the number of parameters which will be updated. In this work, we set
and
following [17], and we set
as this yields an appropriate acceptance rate associated with the independent sampler of the ACF plot. Thus,
and
We can use
as a transformation function between
and
as
and the Jacobian function of
is
.
We draw a proposal value
from a log-normal(
) distribution, and the probability of accepting or rejecting
is the minimum of 1 and the ratio r where r is
which is equivalent to
Our proposal ratio is
which is equivalent to
The ratio r can be expressed as
(25)
where
, and
.
6.2. The Conditional Distribution for σ2
The likelihood of the data depends on
, so we can express the conditional distribution of
as
We know that
, so the conditional posterior for
can be written as
This is the inverse gamma distribution with a shape parameter
and a scale parameter
such that
(26)
The indicator vector can be drawn conditionally on the regressor coefficient and computation of the marginal likelihood is not required. The prior probabilities for
are
where
is the prior probability that
is active. The joint conditional posterior distribution for
has mass function
The conditional density for
given
is
The conditional distribution for the jth component given
is
The conditional posterior probabilities for
are therefore
(27)
and
(eq:4.10)
6.3. The Conditional Distribution for ω
The probability of inclusion
can be drawn conditionally on the indicator and computation of the marginal likelihood is not required. Hence the conditional distribution for
is
Hence,
(28)
6.4. The Conditional Distribution for c
The prior distribution for c is a discrete uniform distribution with support points T = {1/4, 9/16, 1, 4, 9, 16, 25}, and it can be drawn conditionally on the regressor coefficient. The computation of the marginal likelihood is not required. Hence, the conditional distribution for c
Therefore,
Then, the posterior probabilities of the conditional distribution
are
and
The conditional posterior
can be written as
(29)
7. Bayesian Variable Selection Algorithms
In this section, we introduce the computational algorithms which we use in the Bayesian analysis for variable selection using the SSVS, and the SSVS-SPD. In this work, we choose an asymmetric proposal distribution, the log-normal density. We apply the Metropolis-Hastings algorithms to sample the variance ratio
where
, and
is the correlation parameter. This is because of the fact that in our experiments, observations from different subplots within the same wholeplot are positively correlated as
; also observations from different wholeplots are independent [5].
7.1. The Stochastic Search Variable Selection (SSVS) Algorithm
We process the MCMC estimation of the parameters
and c. We use the priors of all these parameters as in Section 5. The following Metropolis-Hastings within Gibbs sampling algorithm can be implemented. Let y be the
vector of random responses, X is the
model matrix without the column of the intercept,
is the
vector of fixed effect parameters, where p is the number of fixed effect parameters that need to be estimated. We set initial values for the parameters as
,
,
,
,
,
,
. Starting at the tth iteration such that
where
, and setting
, the sampling algorithm is:
1) For
, sample
of the indicator vector
using (27) for
,
,
, and
.
2) Sample the mixture weight
using (28) for
.
3) Sample the regressor coefficients
using (24) for X, y,
,
,
,
, and
.
4) Sample the total variance
using (26) for X, y, Z,
, and
.
5) a) Sample
from
;
b) Calculate
using (25) for X, y,
,
, and
;
c) Sample
from
;
d) If
, then set
, otherwise set
.
6) Sample
from the set T with probability mass function given in (29) for
, and
.
7.2. The Stochastic Search Variable Selection for Split-Plot Design (SSVS-SPD) Algorithm
We adapt the SSVS for the analysis of data from split-plot designs by taking into account the two types of factors, i.e. the whole-plot factors and the subplot factors which expected to have different effect sizes for the two strata in split-plot design [9]. This approach can be reported as the Stochastic Search Variable Selection for Split-Plot Design (SSVS-SPD). While the SSVS samples all parameters from one slab variance posterior distribution, the SSVS-SPD samples the whole-plot parameters and the subplot parameters from two different slab variance posterior distributions given that the whole-plot and the subplot effects might have different sizes. While the SSVS samples all parameters from one slab variance posterior distribution, the SSVS-SPD samples the whole-plot parameters and the subplot parameters from two different slab variance posterior distributions. We use the same priors as in the SSVS for all the parameters of interest as in Section 5. Basically, the SSVS-SPD can be seen as running the SSVS twice in one process, one for subplot factors and the other one for whole-plot factors. The algorithm can be explained as follows:
We process the MCMC estimation of the parameters
and c. The following Metropolis-Hastings within Gibbs sampling algorithm can be implemented. Let y be the
vector of random responses, X is the
model matrix without the column of the intercept, X.S is the
model matrix for subplot factors where
is the number of subplot fixed effect parameters.
Also, X.W is the
model matrix for whole-plot factors where
is the number of whole-plot fixed effect parameters. The
is the
vector of fixed effect parameters, where p is the number of fixed effect parameters that need to be estimated,
is the
subplot effect parameters, and
is the
whole-plot effect parameters.
We set initial values for the parameters as
,
. The initial values for the indicator vectors for the subplot factor
and the whole-plot factor
are
and
. Also,
,
, and
. The initial values for the slab variance for the subplot factors
and for the slab variance for the whole-plot factors
are
. Finally, the initial weights for the subplot factors
and for the whole-plot factors
are
.
Starting at the tth iteration such that
where
, and setting
and
, the sampling algorithm is:
1) For
, and
sample
and
of the indicator vectors
and
using (27) for
,
,
,
,
,
, and
.
2) Allocate
.
3) Sample the mixture weights
and
using (28) for
and
.
4) Allocate
.
5) Sample the regressor coefficients
and
using (24) for X, y,
,
,
,
,
,
,
, and
. Where the
is a diagonal matrix with the jth diagonal element
, and
is a diagonal matrix with the kth diagonal element
.
6) Allocate
and
.
7) Sample the total variance
using (26) for X, y, Z,
, and
.
8) a) Sample
from
;
b) Calculate
using (25) for X, y,
,
, and
;
c) Sample
from
;
d) If
, then set
, otherwise set
.
9) Sample
and
from the set T with probability mass function given in (29) for
,
,
and
.
10) Allocate
.
8. Real Data Application
[18] and [19] described an experiment in the production of vinyl for automobile seat covers. The experiment has 28 runs and is a modified version of an example in [19]. It involves the production of vinyl for automobile seat covers. In the experiment, the effects of five factors on the thickness of the vinyl are investigated. Three of the factors are mixture components and two of them are so-called process variables. As in ordinary mixture experiments, the component proportions sum to one. In this example, the response of interest does not only depend on these proportions, but also on the effects of the process variables. The mixture components in the experiment are three plasticizers whose proportions are represented by
and
. The two process variables studied are rate of extrusion (
) and temperature of drying (
). The experiment was conducted in a split-plot format. The process variables are the whole plot factors of the experiment, whereas the mixture components are the sub-plot factors. The data are shown in Table 1.
![]()
Table 1. Data for the Vinyl thickness for three-component mixture experiment with two process variable [20].
A main effects plus two factor interactions model was assumed for the process variable
and
. For the mixture components, the quadratic mixture model was used. The main effects of the process variables were crossed with the linear blending terms only, so the model estimated in [18] is given by
They computed the variance components
by R. The
and the
. The factor effect estimates will be displayed in the next section to compare it with the proposed methods in this paper.
9. Analysis of the Vinyl Thickness Experiment
The real dataset for the vinyl thickness experiment has been used to apply the Bayesian variable selection approach. The model involves 5 main variables (
) and two factor interaction variables (
). We used the prior distributions above. Following [21], in a Bayesian framework the final model could be the median probability model consisting of those variables whose posterior inclusion probability
. The posterior probability of parameter
being active is approximated by
(30)
where
is
sampled at iteration
of the Metropolis-Hastings within Gibbs sampling algorithm.
We summarise the results of applying the Bayesian variable selection for the data from thickness vinyl experiment. The model which will be used is:
The estimates of the 13 parameters of the model have been reported using (SSVS) and (SSVS-SPD) for the response y which is displayed in Table 1 as well as the estimates by the generalised least estimator (GLS) for comparison purpose.
Figure 2 shows a comparison between SSVS and SSVS-SPD with respect to the resulting approximate posterior probability for the thickness vinyl experiment. The parameters
have the highest posterior probability of being active by both SSVS and SSVS-SPD. This indicates that the six associated variables to these terms play a significant role in this experiment. Followed by these terms, we find the parameters
have an approximate posterior probability of about 0.5 and 0.6 by both SSVS and SSVS-SPD. We note that SSVS and SSVS-SPD tend to consider
to be significant at an approximate posterior probability of 0.5 and 0.6 while they are not significant by the GLS method. All methods consider
to be non significant with low approximate posterior probability in this experiment. The Bayesian analysis for the real data of thickness vinyl experiment yielded 10 significant variables. In contrast, the p value for the GLS estimates in Table 2 shows there are 7 significant variables as it excludes the variables associated to the coefficients
. This means that the SSVS and SSVS-SPD yielded in extra 3 significant variables to the model. Table 2 represents the posterior means of the coefficients and standard deviation by both the SSVS and the SSVS-SPD methods and the GLS estimates with the p values for the thickness vinyl experiment. Table 3 shows the posterior mean of the correlation
and the posterior mean of the total variance
for both SSVS and SSVS-SPD methods. Figure 3, shows the ACF plots for Markov Chain using Metropolis Hastings within Gibbs sampling algorithm used by SSVS and SSVS-SPD to sample the correlation
and the variance
.
![]()
Figure 2. Approximate posterior probability for the thickness vinyl experiment.
![]()
Table 2. The estimated coefficients and standard deviations (in parenthesis) for the thickness vinyl experiment by SSVS and SSVS-SPD. The first row is the estimates coefficeients and p-values (in parenthesis) for the GLS.
![]()
Table 3. Posterior means of the
and the
by the SSVS and SSVS-SPD for the thickness vinyl experiment.
![]()
Figure 3. ACF plot for the Markov chain formed by sampling the total variance
and the correlation
by SSVS and SSVS-SPD for the thickness vinyl experiment.
10. Simulation Study Using the Design of the Vinyl Thickness Experiment
We performed a simulation study by generating 1000 datasets, where each dataset would ran for 10000 iterations using MCMC. The SSVS and the SSVS-SPD would be applied at two levels of
and
. We assume that
. Thus, the true value of the total variance
. Also, the true value of
is 0.5. We will calculate Type I and II error rates using the indicator vector
and the approximate posterior probability in 30. If the true variable is active but the algorithm yielded a corresponding approximation posterior probability of less than 0.5, this variable would then have Type II error rate. Also, if the true variable is non-active but the algorithm yielded a corresponding approximation posterior probability larger than or equal to 0.5, this variable would then have Type I error rate. We also will calculate the precision of the point estimates by SSVS and SSVS-SPD by counting the median relative model error (MRME) for the estimates of the SSVS and SSVS-SPD. We focus on the properties of the estimated models by investigating the following properties:
1) consistency in variable selection (frequency in selecting the active/non-active variable), and
2) prediction performance.
For point 1, at 5% significant level, we report Type I error rate (an effect that is truly not significant but the corresponding procedure estimate indicates that it is significant). We also report Type II error rate (an effect that is truly present but the corresponding procedure estimate indicates that it is not significant).
For point 2, following [22] and [23], prediction accuracy is measured by computing the mean-squared error for each penalised estimate
as,
The relative model error (RME) is the ratio of the model error of the penalised estimates to the model error for the GLS estimates of the fixed effects,
where
in Equation (13) is the generalised least squares estimator of
. The median of the relative model error (MRME) over 1000 simulated data sets were reported. MRME values greater than one indicate that the methods estimates perform worse than the GLS estimates, values near to one indicate that the methods estimates performs in a similar way to the GLS estimates, values less than one indicate that the methods estimates performs better than the GLS estimates.
We perform a simulation study to examine the performance of the SSVS and SSVS-SPD methods. Using the design of the thickness vinyl we generate the response given the true model as:
(31)
Type I and II error rates are displayed in Table 4 and Table 5, for two setting of
and
. Also, Table 6 represents the estimated Posterior means of
and
by the SSVS and SSVS-SPD from the simulation by using the design of the thickness vinyl experiment. Figure 4 shows the MRME values at
and
. We notes that the SSVS at
have MRME greater than one which indicates that the estimates by the SSVS is worse than the GLS estimates. While at
, the SSVS perform better than the GLS estimates. The SSVS-SPD have similar performance with the GLS estimates at both level of
.
Type II error rates at both level of
generally are low indicating that the active values are easy to detect by both SSVS and SSVS-SPD. Furthermore, Type II error rates by the SSVS-SPD are lower than Type II error rates by the SSVS. With regard to Type I error rate, for both level of
, the SSVS-SPD have lower Type I error rates than the SSVS. Detecting active variables by both SSVS and SSVS-SPD is better than detecting the non active variables. In summary, we found that the SSVS-SPD detects the active and non-active effect factors with a lower error rate than the SSVS. The analysis of the Median Relative Model Error (MRME) for both SSVS and SSVS-SPD using the thickness vinyl design provides similar behaviour to the GLS except at
for the SSVS as detecting active effect factors was harder than the SSVS-SPD.
![]()
Table 4. Type I error rate for the simulation by using the design of thickness vinyl experiment.
![]()
Table 5. Type II error rate for the simulation by using the design of thickness vinyl experiment.
![]()
Figure 4. Median relative model error (MRME) for the simulation by using the design of thickness vinyl experiment.
![]()
Table 6. Posterior means of the
and the
by the SSVS and SSVS-SPD for the simulation by using the design of thickness vinyl experiment.
11. Conclusion
This paper provided an analysis of data from split-plot mixture process experiments using a motivating example from the industrial environment. Specifically, we recommend the use of the SSVS-SPD method for Bayesian variable selection. In our results, we observed that the SSVS-SPD can identify the active variables (linear and two-factors interaction), much better than the SSVS and the traditional used GLS method. However, as expected this comes with the expense of slightly higher Type I error rates. We also observed a better prediction performance for the models chosen by the SSVS-SPD compared to the models chosen by the SSVS method.