Reliability estimators for the components of series and parallel systems: The Weibull model

This paper presents a hierarchical Bayesian approach to the estimation of components' reliability (survival) using a Weibull model for each of them. The proposed method can be used to estimation with general survival censored data, because the estimation of a component's reliability in a series (parallel) system is equivalent to the estimation of its survival function with right- (left-) censored data. Besides the Weibull parametric model for reliability data, independent gamma distributions are considered at the first hierarchical level for the Weibull parameters and independent uniform distributions over the real line as priors for the parameters of the gammas. In order to evaluate the model, an example and a simulation study are discussed.


Introduction
This paper presents a hierarchical Bayesian approach to the estimation of components' reliability using a Weibull model for each component in series and parallel systems. A series system is a frame of components that works if and only if all its components are functional, that is, whenever one fails the system fails. As a dual frame, the parallel system fails if and only if all components are malfunctioning.
The literature dealing with the problem of estimating the reliability of series systems, or competing risks, is abundant. The work of Kaplan and Meier [3] is arguably the most celebrated work, where it was developed a nonparametric estimator using a frequentist approach. For a Bayesian counterpart, we draw the attention to Salinas-Torres et al. [9] and Polpo and Sinha [6]. Rodrigues et al. [8] performed a simulation study of three different methods to estimate the reliability of a series system. They compared the Kaplan-Meier estimator [3], maximum likelihood estimator (MLE) and the Bayesian plug-in estimator (BPE) for Weibull reliability systems. Their results indicated that MLE and BPE are similar in quality and that both outperformed the Kaplan-Meier estimator. However, the construction of credible bounds was not addressed in their work.
For parallel systems, the literature is scarce. To the best of our knowledge, Polpo and Pereira [5] were the first to address the nonparametric reliability estimation in parallel systems and their components, using the Bayesian paradigm. Later on, Bhattacharya and Samaniego [1] proposed a frequentist nonparametric estimator for components' reliability under a restrictive condition that all components are independent and identically distributed.
In a related work, Polpo et al. [7] presented the reliability estimation with Weibull models and non-informative priors, also using the Bayesian paradigm. Their proposal had very demanding computational needs, and the described Markov Chain Monte Carlo (MCMC) algorithm suffered from convergence issues in many problem instances, making the estimation a difficult task. The authors realized later on that it is often impossible to evaluate the components' reliability with that method because of such issues, and unfortunately one cannot predict when the algorithm will succeed. The present work aims at achieving a robust estimation procedure for the estimation problem. We suggest to consider non-informative priors in an one-level hierarchical model as means to solve the estimation issues faced by the algorithm of Polpo et al. [7]. Moreover, an important goal of this work is to provide a simple way to build credible bounds for the reliability functions, giving a step-by-step algorithm that performs such analysis.
Similarly to Polpo et al. [7] and Rodrigues et al. [8], we consider Weibull statistical models for the system's reliability. Two independent gamma distributions are considered in the first hierarchical level. The gamma distributions are parametrized by their means and variances, instead of the more common parametrization with scale and shape. In the second level of the hierarchy, we choose two flat priors for the means, and fixed values for the variances of the gamma distributions in the first level. These hyper-parameters corresponding to variances in the first level can be seen as prior precision parameters. The means of the gamma distributions (first level of the hierarchy) can be viewed as the prior expectations of the parameters of the Weibull model. In this work, the posterior modes are taken as the Bayesian estimators of the gamma distributions.
The estimation of the reliability functions has three main steps: (i) we draw a sample from the posterior distribution of the Weibull parameters; (ii) using the appropriate transformation, we build a sample from the reliability posterior distribution; and (iii) locally, for each reliability time, we evaluate the posterior mean. The high posterior density (HPD) procedure was used to define the credible region for the reliability function. We emphasize that we are not using the plug-in estimator, but the posterior mean of the reliability function, which seems more suitable under the Bayesian paradigm. This paper is organized as follows. Section 2 describes all functions that are involved in the estimation procedure. Section 3 provides the estimation procedure itself. Section 4 presents a simulation study that highlights the quality of the model and the proposed estimators, and final remarks and additional comments are given in Section 5. We note that an extended abstract of this work has appeared in the Brazilian Conference on Bayesian Statistics [2].

The model
We use the same notation as in Polpo et al. [7]. Consider a system of k components and let X j , j ∈ {1, . . . , k}, be the sequence of failure times of all components. We assume that this sequence is composed of independent random variables with a (possibly) different Weibull distribution for each component. Recall that we only observe a random vector of two variables, namely, (T, δ) with T = min(X 1 , . . . , X k ) for the series system and T = max(X 1 , . . . , X k ) for the parallel system, with δ = j if T = X j , for j = 1, . . . , k. The δ quantity can be viewed as an indicator function of the component that caused the system failure.
Consider a sample of n independent and identically distributed systems (either all series or all parallel systems). The observations are represented by (T, δ) = {(T i , δ i ) : i = 1, . . . , n}. The reliability function of the jth component is given by R j (t) = P (X j > t), j = 1, . . . , k. Therefore, the reliability function is R(t) = k j=1 R j (t) for the series system and R(t) = 1 − k j=1 (1 − R j (t)) for the parallel system. We define random variables X j for the components' reliability with Weibull distributions parametrized by θ j = (β j , η j ), that is, for x > 0, β j > 0 (shape) and η j > 0 (scale). Then, the likelihood function for the series system is given by and for the parallel system, where f is the density function of a random variable with Weibull distribution, θ = (θ 1 , . . . , θ k ), and I A is the indicator function of the set A.
and v β j and v η j as known constants, j = 1, . . . , k. Then In this case, we have that the posterior distributions of series and parallel systems are, respectively, where t = (t 1 , . . . , t n ) are the observed failure time of the system, δ = (δ 1 , . . . , δ n ) are the indicators of which component failed, and the other quantities are as defined before.

Estimation
For the estimation, we use the EM algorithm [4] to obtain the posterior mode as estimates of m β and m η , and the MCMC procedure to generate a sample from the posterior distribution of β and η. The algorithm steps for the estimation are briefly given as follows: 1. Choose the prior precision values v β and v η . Note that one can set to the same value all the precision values, that is, v = v β = v η . We suggest the use of v = 4.
2. Choose the initial guess for the parameters to be estimated: m β , m η , β and η.
3. Using the initial guess, consider m β and m η as fixed values. Employing the MCMC tool, generate a sample (of size n p ) from the posterior distribution of β and η. We suggest the use of n p = 1000. It may also be necessary to use a "burn-in" and a "jump" to ensure convergence of the MCMC. Using this algorithm, we obtain the posterior mode (the Bayesian estimate) of m β and m η , and a sample (of size n p ) from the joint posterior distribution of β and η. If we estimate the reliability function of any component (let us arbitrarily choose component 1), then we can notice that the estimations of the other components' reliability functions are very similar and could be omitted here. Consider that the sample from the posterior of the parameters of component 1 can be expressed as (β 11 , β 12 , . . . , β 1np ) and (η 11 , η 12 , . . . , η 1np ). To obtain the reliability estimates and credible regions, consider the functions Y ℓ (t) = F (t | β 1ℓ , η 1ℓ ), ℓ = 1, . . . , n p , where F (t | β 1ℓ , η 1ℓ ) are Weibull distribution functions conditioned on β 1ℓ and η 1ℓ . Consequently, the posterior mean estimate of the component's reliability can be expressed as Hence, for each fixed t, Y (t) = (Y 1 (t), . . . , Y np (t)) is a sample from the posterior of the component 1's distribution function and, to obtain the credible region, we can either use the quantiles of Y (t) or evaluate the high posterior density credible interval. To estimate the mean reliability time, we have is the mean of a random variable with Weibull distribution, and Γ(·) is the gamma function. Note that similar procedures can be used to evaluate other quantities of interest; as an example, the posterior median could be evaluated.

Examples
Example 1 Consider three random variables X 1 , X 2 , X 3 such that X 1 has Weibull distribution with mean 2 and variance 4, X 2 has gamma distribution with mean 2 and variance 0.667, and X 3 has log-normal distribution with mean 2.014 and variance 6.968. We have generated a sample (with size n = 100) of series systems with these three components and another sample (again with size n = 100) of parallel systems with the exactly same three components. The components were chosen in order to have similar means but different variances and, consequently, different distributions. We have used the same theoretical components in both simulations (series and parallel systems) to verify, in each situation, the differences that are due to the distinct system models with the available data. The simulated data have the following characteristics: (i) for the series systems, we have obtained 64%, 80%, and 56% of censured data for components 1, 2, and 3, respectively; and (ii) for the parallel systems, we have observed 61%, 68% and 71%, respectively for the same three components. In this case, the main interest is in the estimation of the components' reliability functions. Note that, with our simulated example, we have a huge amount of censored data, making it a challenging example. As already said, the estimation procedures are performed using MCMC. We have discarded the first 10, 000 samples (as burn-in) from the posterior to achieve the stationary measure and then have generated a sample from the posterior. To perform the estimation of the reliability functions and the credible region, we have used a sample of size 1, 000 from the posterior, which was obtained by discarding 10 samples (the jump between each final sample point). We have used v = 4 in the prior specification for all parameters and systems. For the experiment with series systems, we have obtained m β = (1.31, 4.12, 1.44) and m η = (2.19, 2.03, 1.80). For the parallel systems, the estimates are m β = (1.28, 2.57, 0.92) and m η = (2.67, 2.28, 2.11). To evaluate the quality of these estimates, we have compared the "true" reliability of each component with the estimated reliability function. Table 1 presents the posterior mean and the posterior standard deviation of each parameter involved in the model (for both series and parallel systems). Note that the standard deviations are relatively small, and the estimation of the mean reliability time is very close to the original values, indicating a good performance of our method. It can be seen from the 95% credible bounds that the "true" reliability of each component was well estimated. We note however that the "true" reliability functions of the components are, for short (time) intervals, outside the 95% credible bounds. Considering that these are reasonably challenging examples, this situation is likely to happen in any estimation procedure (see Figures 1 and 2).

Example 2
This example has a simulation study to show the quality of the proposed hierarchical Weibull model over many different conditions. We have considered 108 different scenarios that were built using three different sample sizes (n = 30, 100, 1000), three different proportions of censored data (0%, 20% and 40%), two different means (2 and 7) of reliability time for the generating distribution, three different generating distributions (Weibull, gamma, log-normal), and two types of censoring (right and left). In all these cases, we have fixed the variance of the generating distribution to 5. We have used a non-random censored approach to guarantee the desired proportions of censure, that is, for each scenario, we have fixed a time for which all values that are larger than this time are assumed censored for the right-censored data (series systems), and all values smaller than this fixed time are assumed censored for the left-censored data (parallel systems). In order to improve the analysis, 100 copies of each scenario were considered.
To summarize and to compare the results of the simulation, we have evaluated the bias and the mean squared error (MSE) of the estimated mean reliability time, for each scenario. The results are presented in Tables 2-7. One can see from the results that all biases and MSEs are close to zero, indicating that, in all scenarios, the model has estimated well the true mean reliability time. Even in the most challenging scenarios, which are those with small sample size (n = 30) and large proportion of censoring (40%), the estimated mean reliability time has had a good performance. When the generating distribution is Weibull, the model should obviously estimate well, yet the results show that the performances were good even for the other two generating (non-Weibull) distributions.

Final Remarks
We have introduced a Bayesian reliability statistical analysis using hierarchical models for the problem of estimating the reliability functions and credible bounds of series and parallel systems. The MCMC has shown good performance in terms of convergence, making the inference process simple and efficient. It shall be noted that this performance is not dependent on our choice of a "non-informative" scheme to define the prior hyperparameters. This is important because other researchers may want to fairly compare our method with other frequentist estimators. However, informative priors may very well produce additional improvements in the estimates. The Example 1 has shown good robustness in the sense that the model has performed well for all components in both series and parallel systems. Another important aspect is that we can obtain credible bounds for the reliability function, task that is usually hard if one uses a plug-in estimator for the reliability function. The Example 2 provides an extensive simulation study with more than one hundred different scenarios. Overall, the model has performed very well for estimating the mean reliability time. Some open questions that should be addressed in future works are the development of hypothesis tests for the components, for instance, one can have interest in testing the hypothesis of equal means of all components (or a subset of components), and the extension of these ideas to more general coherent systems.