Bayesian Joint Modelling of Survival Time and Longitudinal CD 4 Cell Counts Using Accelerated Failure Time and Generalized Error Distributions

Survival of HIV/AIDS patients is crucially dependent on comprehensive and targeted medical interventions such as supply of antiretroviral therapy and monitoring disease progression with CD4 T-cell counts. Statistical modelling approaches are helpful towards this goal. This study aims at developing Bayesian joint models with assumed generalized error distribution (GED) for the longitudinal CD4 data and two accelerated failure time distributions, Lognormal and loglogistic, for the survival time of HIV/AIDS patients. Data are obtained from patients under antiretroviral therapy follow-up at Shashemene referral hospital during January 2006-January 2012 and at Bale Robe general hospital during January 2008-March 2015. The Bayesian joint models are defined through latent variables and association parameters and with specified non-informative prior distributions for the model parameters. Simulations are conducted using Gibbs sampler algorithm implemented in the WinBUGS software. The results of the analyses of the two different data sets show that distributions of measurement errors of the longitudinal CD4 variable follow the generalized error distribution with fatter tails than the normal distribution. The Bayesian joint GED loglogistic models fit better to the data sets compared to the lognormal cases. Findings reveal that patients’ health can be improved over time. Compared to the males, female patients gain more CD4 counts. Survival time of a patient is negatively affected by TB infection. Moreover, increase in number of opportunistic infection implies decline of CD4 counts. Patients’ age negatively affects the disease marker with no effects on survival time. Improving weight may improve survival time of patients. Bayesian joint models with GED and AFT distributions are found to be useful How to cite this paper: Erango, M.A. and Goshu, A.T. (2019) Bayesian Joint Modelling of Survival Time and Longitudinal CD4 Cell Counts Using Accelerated Failure Time and Generalized Error Distributions. Open Journal of Modelling and Simulation, 7, 79-95. https://doi.org/10.4236/ojmsi.2019.71004 Received: March 20, 2018 Accepted: January 18, 2019 Published: January 21, 2019 Copyright © 2019 by author(s) and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access M. A. Erango, A. T. Goshu DOI: 10.4236/ojmsi.2019.71004 80 Open Journal of Modelling and Simulation in modelling the longitudinal and survival processes. Thus we recommend the generalized error distributions for measurement errors of the longitudinal data under the Bayesian joint modelling. Further studies may investigate the models with various types of shared random effects and more covariates with predictions.


Introduction
Survival of HIV/AIDS patients is crucially dependent on comprehensive and targeted medical interventions.Health professionals monitor patients' health status using such disease markers as CD4 T-cells counts.The disease progression as indicated by the longitudinal CD4 measures may affect the time of an event of interest-death of a patient in this case.The main interest of inference is on the association between the longitudinal and survival processes.Joint models for longitudinal and time-to-event are based on the joint distribution of the two processes [1].The joint analysis may be appropriate when the longitudinal variable is correlated with patient's health status and incorporate all information simultaneously so as to provide valid and efficient inferences [2].
The traditional approach in the analysis of survival data assumes a homogeneous population, where all individuals have same health risks.In practice, individual patients possibly differ in health risks such as their vulnerability to causes of death, responses to treatments, and influences of various risk factors.Joint modelling of the two data often assumes normal distributions in the linear mixed models [2] [3] [4].It is interesting to look for alternative distributions that can accommodate data that may not be normally distributed.
The current study considers the longitudinal measure in terms of its rate of growth.The rate of growth is an important concept in studying changes.If the level of growth is viewed as the current status of a process at a specific time, the rate of growth measures how fast the process is changing at that time [4].Studies by [4] extend the usual growth models using the generalized error distribution (GED) and estimate its parameters under Bayesian framework.The author studied such a general form of linear growth model it , where it Y is growth observation for individual i at time t assuming the error term to have the generalized error distribution

Description of Data
The study considers two data sets that are collected from two hospitals under similar settings and as considered by [5] [6] [7].The data are extracted from patients' charts which contain epidemiological, laboratory and clinical information of the patients.Patients with ages less than 16 years old are also those who started ART before the defined study period were not included in this study.Description and codes of the explanatory variables are described in Table 1.

Linear Mixed Models
The longitudinal data, CD4 T-cell counts, are measurements on the response variable taken from same individuals over several observation times.Thus the set of observations on a subject tends to be inter-correlated [8] [9].The two sources of variations expected are the within-patient and the between-patients variations.Analysis of within-patient variation allows studying of changes of the CD4 counts over time, while analysis of between-patients variation allows understanding differences between patients.
Here we assume that the longitudinal CD4 measure has the generalized error distribution for instead of normal.For any variable Y that follows the generalized error distribution, its density function GED ; , , y γ µ σ with three parameters as adapted by [4] from [10] is: with: ( ) ( ) ( ) ( ) Here µ is location and 2 σ is scale parameters.And γ is the shape parame- ter of GED that is related to kurtosis of the distribution and characterizes non-normality of Y.The GED can model the error distribution more flexibly than the normal one [4] [10] [11].
The generalized error distribution generalizes the normal distribution.Normal distribution is a special case of GED when 0 γ = in which case ( ) The GED in Equation ( 1) is expressed in a simpler form by [13] as follows: The normal distribution is a special case of this form of GED when s = 2 and so 1 2 But in many situations, data are assumed normal though normality may not be an appropriate assumption.The statistical package such as fitdistrplus developed by [14] can be used to see whether or not measurement errors of data at hand are normally distributed.
The generalized error distribution was first introduced by Subbotin [15] as class of symmetric distributions with variation in kurtosis.The distributions have many structural properties close to a normal distribution.Many researchers have studied GED including its applications but not in the joint models studied here.Nelson [16] developed linear regression and time series models with heavy tails assuming the underlying distribution to be the GED.It can be used in statistical modelling if the observation errors are not necessarily normally distributed.Zhang [4] proposed and studied linear mixed growth models for longitudinal data with the GED so as to handle leptokurtic and platykurtic errors.The author reported that such models fit better to data than the respective models with normality assumptions.
In our case, we first analyzed the CD4 counts data in fitdistrplus package [14] and found that measurement errors of the longitudinal CD4 data seem non-normally distributed.Then we define the linear mixed model using generalized error distribution.Let it Y be the longitudinal CD4 measurement of the i th patient 1, 2, 3, , i n =  at times 1 , , T t t t =  .The linear mixed model for the longitudinal process with assumption of generalized error distribution for the error term is defined as: ( ) ( ) where The shape parameter γ is an important parameter to be studied here.

Survival Models
The survival time is random variable defined on non-negative real numbers.The observed time is taken as the minimum ( ) The regression model linked with the covariates for each individual i is given as: Assuming that the survival time follows loglogistic distribution, its probability density function S t and hazard function ( ) with parameters with parameters λ and ρ can be expressed respectively as: The regression model is linked with the covariates for each individual  as follows: ( ) ( ) The AFT models allow the direct effects of covariates on survival time instead of hazard rate.Given a vector of predictors 2i X , the log-linear form of the AFT model for survival time i T of individual patient 1, 2, 3, , i n =  can be written as: where i α is a vector of unknown coefficients of 2i X , ( ) W t refers to subject specific random effects having normal distribution, i ε is a sequence of mu- tually independent measurement errors that follows AFT distributions, in this case, lognormal and loglogistic distributions.

Likelihood Model
The association between the longitudinal and survival processes is assumed to come through stochastic dependences denoted by There are many ways of making the linkages [2].Here we consider the links used in [5].
Thus the joint models that link the GED based model of longitudinal process in Equation (3) to the AFT based model of survival process in Equations ( 4)-( 6) is given as follows: ( ) ( ) where 1 2 , r r are association parameters.Note that 1 2 , i i U U are latent variables that are independent subject-specific random effects having bivariate Gaussian distribution with mean zeros and constant variances.These effects are assumed Open Journal of Modelling and Simulation to be induced by the longitudinal process to the time-to-event process through the random intercept and random slope terms in the linear mixed model.
We assume Y and T are conditionally independent given the random effects , w w = w and model parameters , = θ θ θ .The two sets of parameters are one for the linear mixed model ( ) , , , , where each δ is an indicator for a patient's survival with 1 δ = if death event occurs and 0 δ = if censored.

Prior Distributions
Non-informative joint prior distribution of the parameters , , w w π θ is considered.Individual parameters β's and α's are assumed to be independently and identically normally distributed with mean zero and large variance 1000.
The association parameters 1 2 , r r are each assumed to have normal distribution with mean zero and variance 1000.The shape parameter γ of GED, the shape parameter of loglogistic distribution ρ , and precision parameters 2 τ σ − = all are assumed to follow Gamma(2, 0.5).

Posterior Distribution
The Bayesian model [10] is defined by the posterior distribution ( )

Results and Discussion
The objective of this study is to model the longitudinal CD4 measurement and the associated time to death data using Bayesian joint modelling approach.The generalized error distribution is assumed for the square root of the CD4 T-cell counts, while lognormal and loglogistic distributions are assumed for the survival time.Two data sets collected from two hospitals are analyzed using four Bayesian joint models.The findings from the models are all interpreted as they are important in many ways.

Descriptive Analysis
For Data 1 taken from Shashemene referral hospital, the average baseline CD4 cell counts is estimated to be 156.9 with standard deviation of 92.5 per mm 3 of blood sample.By the end of study period, percentage of death event is about 5.9%.The average survival time of the patients is estimated to be 48.The baseline CD4 counts reveal same variabilities in the two studies which is about 59% as measured by coefficient of variation.However, it seems that there a slight difference between variabilities for time to event data with 44% for Data 1 and 40% for Data 2.
To understand the relationship between the longitudinal measure and follow-up time, mean structures are plotted in Figure 1.The plots show that the average of square root of CD4 counts may have a quadratic relationship with patient's follow-up time.We thus include both observation time and its square in the linear mixed models as predictor variables.

Inferential Analysis in the Case of Data 1
In the analysis of Data 1, twenty one parameters are estimated using the two defined Bayesian joint models based on GED-lognormal and GED-loglogistic distributions.The results of analysis are displayed in Table 2

Bayesian GED Lognormal Analysis
The results of analysis are displayed in Table 2.They reveal that the shape parameter of the GED of the longitudinal measure is significantly different from

Bayesian GED Loglogistic Analysis
The results are displayed in Table 5.The shape parameter of the generalized For survival sub-model, the results reveal that improving a patient's weight improves her/his survival time.Note that weight is an important variable in explaining both the longitudinal and survival processes of a patient.Healthy functional status and condom use during sexual intercourse also positive effects on survival time.Age and tobacco use are not significant for this data case.

Assessment of Convergence
For each of the Bayesian models, three parallel sampling chains of 60000 iterations with different starting values are generated.Some plots are given in Figure 2. Inferences are made based on samples of the posterior distributions that are taken with thinning of 10 after burn-in of 25000.Time series plots of the history of the simulations show a reasonable degree of randomness and they may convergence to same values.Auto-correlations and Gelman-Rubin statistics are also used to assess convergences.Finally independent samples are taken from the posterior distribution after convergence of the realization with specified burn-in and thinning values, and then all inferences are made using those samples.
Assessment plots are displayed in Figure 2 from the analysis of Data 1 using the Bayesian GED loglogistic model.Results of two parameters: shape parameter γ of GED and shape parameter ρ of loglogistic distribution are illustrated.
They show that the simulations converge.
The findings reveal that the generalized error distribution for both data sets has positive estimate of the shape parameter and so it is of fatter tails than normal distribution.There is higher variation on the CD4 T-cell counts for the data from Bale Robe general hospital (about 49%) than that obtained from Shashemene referral hospital (about 10%).
The posterior distributions estimated under the selected models, Bayesian GED loglogistic models, are the solutions required in this analysis.Though the association parameter in the joint model is significant for one data but not for the other data case, both fitted models are still important to consider as they are newly defined models implementing the generalized error distribution.Thus the respective results are used to report findings on how the longitudinal CD4 counts and survival time of a patient are related and parameter estimations.
The findings from this study are fairly consistent with the studies by [5] except for the type of models selected.They suggested different models for the two data sets while only one is recommend here.This may be expected as the GED is involved in this study and as indicated by [4] the GED model can gain more insight on the error distributions than the normal growth curve models.

Conclusions
The current study focuses on developing Bayesian joint models with the as- the random effects i β .The GED has a shape parameter t γ that may possibly vary Open Journal of Modelling and Simulation compared various Bayesian joint models involving the Weibull, lognormal and loglogistic AFT distributions and normality assumption for the longitudinal CD4 measure using two data sets.They recommended the Bayesian joint loglogistic model for one data set collected from Shashemene referral hospital and the Bayesian joint lognormal model for the second data set collected from Bale Robe general hospital.These models have same hazard rate functions as that of data sets.In the current study, we further analyze the same data sets with newly defined Bayesian joint models considering the generalized error distribution for the longitudinal measure instead of normal distribution.

.
cases include a Laplace (double exponential) distribution when 1 γ = and when becomes a uniform distribution γ ap- proaches-1.It becomes leptokurtic distribution for > , and gets thinner tails than the normal distribution when 0 γ < .Choy and Smith [12] derived the GED as a scale mixture of normal distributions for Open Journal of Modelling and Simulation case.The joint likelihood function of the data from the two processes can be given as: parameters θ and random effects w given the data and is ex- main challenge here is computational difficulty.The standard maximum likelihood method involves integrating out latent variables from the log likelihood function which is difficult when the parameters are of high dimensional.Simulation simplifies the computational challenges.Here the Bayesian model in Equation (10) is computed using Markov chain Monte Carlo methods with the Gibbs sampler algorithm that is based on full conditional distributions of the parameters[7] [13][20].The Gibbs sampler algorithm is implemented in the WinBUGS software version 1.4[20].The final inferences are made based on independent samples taken from the posterior distribution after convergence of the realizations.Time series plots, auto-correlations and Gelman-Rubin statistics are used to assess and confirm convergences.

Figure 1 .
Figure 1.Plots of mean of square roots of CD4 Counts over observed time for the two studies: (a) Data 1 and (b) Data 2.

Figure 2 .
Figure 2. Plots from analysis of Data 1 using the Bayesian GED Loglogistic.(a) Time Series plots of simulations of shape parameter γ of GED and shape parameter ρ of loglogistic distribution; (b) Autocorrelation plots; (c) Plots of Gelman-Rubin statistics.
sumption of generalized error distribution for the longitudinal CD4 observations and of two AFT distributions for the survival time of HIV/AIDS patients.Analyses of two different data sets show that measurement errors of the longitudinal CD4 variable are not normally distributed and so are modelled by the generalized error distribution.The distributions have fatter tails than the normal distribution.The Bayesian joint GED loglogistic models are found to be important models in fitting to the data sets.Fairly consistent estimates of parameters and

Table 1 .
Explanatory variables with codes.
time dependent mean response of it 2 σ , and it  is a random error distributed as

Table 3 .Table 2 .
& Parameter estimations for the Bayesian Joint GED Lognormal Model in the case of Data 1.

Table 3 .
Parameter estimations for the Bayesian Joint GED Loglogistic Model in the case of Data 1.

Table 4 .
Parameter estimations for the Bayesian Joint GED Lognormal Model in the case of Data 2. Open Journal of Modelling and Simulation counts on average than males.Moreover, the mean CD4 counts of a patient declines as patient's age increases, weight decreases and number of opportunistic infection increases.