Statistical Analysis of a Competing Risks Model with Weibull Sub-Distributions

Statistical inference for a competing risks model using Weibull sub-distributions is discussed in this paper. Both maximum likelihood and the Bayesian procedures are applied to report the point and interval estimations of all model parameters and some of its reliability measures. Complete analysis of a real data set is performed to show the applicability of the studied model.


Introduction
In several applications in reliability analysis, medical research and biological situation there may be more than one cause of failure competing for the event.The event can be either death or recover from a certain disease (risk), see [1] [2] [3] [4].In the literatures, the model that can be applied to deal with such situations is called competing risks model.
As examples of situations that require a competing risks model: 1) a patient may die from one of several causes such as cancer, heart disease, complications of diabetes, etc., and 2) in AIDS study, if the endpoint is time to opportunistic infection (relapse), the death and infection may be viewed as competing risks during the treatment of death and loss to follow-up as censoring, see [2] [5].
The available data, in the competing risks models, consist of the times to event and an indicator of either the cause of failure or censored.One of the interesting points in competing risks models is to report on the distribution of the time to failure from one risk in the presence of all other risks, which is called as the relative risk.
Competing risks models are studied by several authors based on both parametric and non-parametric setup, [6] [7].It is assumed, in the parametric setup, that the risks follow a variety of lifetime distributions.Examples of such distributions include gamma, Weibull, exponential and generalized exponential, [4] [8] [9] [10] [11] [12].The non-parametric setup does not consider a specific life time distribution.The analysis of the non-parametric version of competing risks models is investigated by several authors, such as [13] [14] [15].
The Weibull distribution has been frequently used in reliability analysis, see [16] and [17].Weibull distribution is flexible (its hazard function can be constant, decreasing or increasing) and therefore can be used quite effectively to analyze a variety of real data sets.Various properties of Weibull distribution are discussed by [1] [12].[18] discussed statistical inferences of a competing risks model assuming exponential causes and Weibull causes of failure with the same shape parameter.They used the maximum likelihood method when the cause of failure is either known or unknown.
In this paper, we discuss statistical inferences of a competing risks model when risks follow Weibull distributions with both unknown scale and shape parameters when the data can be censored.We will use maximum likelihood and Bayes methods.In Bayesian approach, we will consider that the unknown parameters are independent and follow gamma prior distributions.The reset of the paper is organized as follows.Section 2 describes the competing risks model assumptions.In Section 3, we derive the likelihood function and obtain the maximum likelihood estimates and discuss the two-sided confidence intervals for the model parameters.The Bayesian procedure is discussed in Section 4. The risk dues a specific cause in the presence of all other causes, for the underlying model, is discussed in Section 5. A complete analysis of a real data set is provided in Section 6.Finally, Section 7 concludes the paper.

Model Assumptions
Assume that there is a set of N identical and independent objects (or systems) on life test.Each object is assigned to one of , 2 k k ≥ , independent causes of failure.Every object is tested until it fails or reaches a censored time.In the failure case, the object fails due to only one cause.The whole test will be terminated after either all objects fail or it reach the censored times or a combination of the two.
When an object fails, there will be two observable quantities, the object's lifetime, say T, and the cause of failure, say δ , where . While in the censored case, we observe only the censoring time.To simplify the notations, we set 0 δ = for the censoring case.Furthermore, we need the following assumptions throughout the paper: 1) i T is the lifetime of object which has a cumulative distribution function ( ) Obviously, { } 3) More specifically, we assume that ji T follow Weibull distributions with unknown parameters j α and j β , say ( ) the probability density function is and the survival rate function is and the hazard rate function is ( ) where j β is the shape parameter and j α is the scale parameter.

The Likelihood Function and Maximum Likelihood Estimates
In this section we will derive the likelihood function of the model parameters using the available data described above.The available data can be expressed as , , , , , ,

The Likelihood Function
Using data described above, the likelihood function is where θ is the vector of 2k unknown parameters, ( ) , , , , , , , Based on the model assumptions and using the well-known relations between the reliability measures "the survival, hazard rate and the probability density functions", we can write the likelihood function as by taking the natural logarithm for both sides of Equation ( 6), we get the log-likelihood function in the general case as where I(A) is an indicator function that is defined as I(A) = 1 if A is true and 0, otherwise.
Substituting from ( 3) and ( 4) into ( 6) and ( 7), we get the likelihood function and the log-likelihood function for the competing risks model, when the causes follow Weibull distribution with unknown parameters, as ( ) ( ) ( ) As well known, the maximum likelihood point estimate of θ is the set of values of its elements that maximize the likelihood (or log-likelihood) function.
The first partial derivatives of  with respect to l α and , 1, 2, , l l k ( ) where , , 1, 2, , lj l j k δ = is Kronecker delta.To get the maximum likelihood point estimates for the parameters, we set 0 then solve the system of 2 k equations obtained with respect to j α and j β , 1, 2, , j k = .The obtained system has no analytical solution, therefore we should use a numerical technique to get the maximum likelihood estimates of the parameters.
In order to obtain the information matrix, we need the second partial derivatives of  with respect to l α and l β , , which are given below ( ) where , 1, 2, , l m k = .

Confidence Intervals
The maximum likelihood estimators for the parameters cannot be obtained in analytic forms.Therefore, their actual distributions cannot be derived.However, we can use the asymptotic distribution of the maximum likelihood estimator to derive confidence intervals for , , , , , , , , , ,

Bayesian Procedure
In this section, we use Bayes approach to estimate the unknown model parameters j α and , 1, 2, , j j k β = . We assume that all parameters are independent and follow gamma distributions with different but known prior hyperparameters.
That is, we assume that j α follows Gamma prior distribution with shape pa- Therefore, the joint prior density of ( ) , , , , , Combine the likelihood function ( 8) and the joint prior density (17), and using the Bayes' theorem, we get the joint posterior density function of θ , up to a constant, as ( ) ( ) where ( ) Under quadratic loss function, the Bayesian estimate of any function of the vector of unknown parameters θ , say ( ), v θ is the posterior mean of that function.That is,

R E T R A C T E D (19)
The integral in Equation (19) as well as the normalized constant included in (18) have no analytical solutions.Therefore, numerical approaches should be used to do Bayesian analysis of the underlying model.Among several approaches, we will use Markov Chain Monte Carlo (MCMC) simulation technique to do the analysis.MCMC algorithm can be used to get random draws from the posterior distribution with density given in (18) without calculating the normalized constant.Then we can use the random draws to do any analysis we wish about the model parameters and model characteristics.

Markov Chain Monte Carlo Method
One of the most successful methods in modern Bayesian statistics is the Markov Chain Monte Carlo (MCMC) technique.MCMC method is an algorithm to summarize the posterior distribution without calculating the normalized constant.MCMC techniques have been extensively used to become among the main computational tools in the Bayesian statistical inference [19].The Metropolis-Hasting sampler is a modified version of the MCMC method.One of the main ideas in MCMC is to find a suitable distribution function, called as "proposal" that satisfies two conditions: 1) easy to simulate from, and 2) it mimics the posterior distribution function of interest.Once we determine such proposal, we get random draws from it, we apply the acceptance-rejection rule to get random draws from our target posterior distribution.
The following describes the steps of Metropolis-Hasting algorithm to simulate random draws from the posterior distribution ( ) g θ : 1) Set starting point of the chain, say ( ) 0 θ .2) Set a size of trails we get for the random draws, say M. g θ .

The Risks
One of the important characteristics in competing risks models is the relative risk of one of the competing risks in the presence of all other risks.In this sec-

R E T R A
C T E D tion, we discuss how to estimate the failure probability distribution for each cause of failure in the existence of all others and the risk due to a specific cause of failure.The probability of failure due to cause j at time t in the presence of all other causes is, see [17] [20], ( ) ( ) ( ) The risk due to cause 1, 2, , For the Weibull competing risks model discussed here, j π can be obtained by solving the following integral The above integral has no analytic solution.As special case, when all shape parameters for causes are equal, , .
Using the invariant property, the maximum likelihood estimate of j π can obtained by numerical integration when the integrand function in ( 22) evaluated at the maximum likelihood estimates of the parameters.For Bayesian analysis, we will us the random draws that obtained from the joint distribution along with the integral above to get random draws from the posterior distribution of j π , then use it do all Bayes analysis we wish on , j π 1, 2, , j k = .

Application
We study in this section an application of a real-life data set.This data set gives the times (in years) from HIV infection to AIDS, SI switch and death in 329 men who have sex with men (MSM).Data are from the period until combination anti-retroviral therapy became available (1996).For more background information on this data, see [21] [22].It was used as example for the competing risks analyses in [23] [24].These data can be considered as competing risks data with two risks.There are some individuals in the study left with no infection switch or death.Those are considered as censored observations.Here, we use the model discussed in this paper to analyze this data set.Table 1 shows the maximum likelihood and Bayes point estimates of the fours model parameters.We used R language to do all calculations.
As shown in Table 1, the results from both methods are completely different.Since we do not know the actual values of the unknown parameters, then using those point estimates will not be sufficient to judge which method provides better estimates.One of the comparison tools is the interval estimation or the corresponding variance to each point estimate.This why we calculated them as

R E T R A C T E D
As we can see, the variances of 1 α and 2 α are very large comparing to their values.This would lead to negative lower bound of the asymptotic confidence intervals of 1 α and 2 α .Since 1 α and 2 α cannot be negative, we truncate the lower limits at zero.This is one of the disadvantages of the maximum likelihood method.
Table 2 shows asymptotic 95% confidence intervals of the four unknown parameters.Table 3 shows the estimated risks using both two methods.The 95% credible intervals of the four unknown parameters are shown in Table 4, while Table 5 gives the measures of central tendency of the four parameters using their marginal posterior distributions.As Table 4 and Table 5 show, not only the lower limits of the credible intervals of 1 α and 2 α are larger than zero, but the minimum values of 1 α and 2 α are larger than zero.This is one of the advantages of the Bayes method.
In Bayes case, we set all hyperparameters equal and equal to 0.001 that reflects non-informative prior.To apply MCMC, we used the proposal as a multivariate t distribution with mode equal to the vector of the maximum likelihood estimates, variance-covariance matrix equals to the inverse Fisher matrix and four degrees of freedom.Also, the number of draws, M, was 10000.As a diagnostic test for the MCMC, we plotted the autocorrelations and the traces for each parameter as shown in Figure 1 and Figure 2. The trace plots show good mix of the sampled draws and the autocorrelation plots show that the Lag decreases rapidly, which indicates that the draws become approximately independent over time and the draws come from the actual posterior distribution.We used the random draws from the joint posterior distribution of θ to get random draws for the risks , 1, 2 j j π = from their marginal posterior distributions without obtaining those actual marginal distributions.We used those draws to calculate Bayes estimates of j π and sketch their posterior density functions.shows the trace plots of the draws of j π and the corresponding marginal post- erior density functions.Furthermore, we used the random draws of θ to get random draws for the sub-survivors and the overall survivor and used them to calculate the Bayes estimators along with 95% credible intervals for those functions at different time as shown in Figure 4.

Conclusions
In this paper, we attempted to examine competing risks models with , 2 k k ≥ ,  independent causes of failure in the presence of data.We derived the likelihood equation of the model in general and used it to derive the likelihood function when the risks follow Weibull distributions with unknown shape and scale parameters.

R E T R A C T E D
We discussed how to get the maximum likelihood estimates and Bayes estimates of the model parameters well as for some important reliability measures such as the individual risks, sub-survivors and overall survivor.In Bayes analysis, we used gamma prior distribution for all unknown parameters with known

R E T R A C T E D
covariance matrix that can be obtained as the inverse of the information matrix of θ .The elements of the information matrix are the second partial derivatives of  evaluated at the maximum likelihood point estimate of the unknown parameters.That is ( )

Figure 1 .
Figure 1.The autocorrelation plot of the simulated draws using MCMC after discarding the first 50% of the draws.

Figure 2 .
Figure 2. The trace plots of the simulated draws after discarding the early 50% of the draws.

Figure 3 .Figure
Figure 3. Trace plots and posterior density functions of

Table 2 .
The asymptotic confidence intervals using Maximum likelihood method.

Table 3 .
Estimates the two risks.

Table 4 .
95% credible intervals for the four parameters.

Table 5 .
The measures of central tendency of the parameters 1 1 2 , ,