Parameter Estimation in Logistic Regression for Transition, Reverse Transition and Repeated Transition from Repeated Outcomes

Abstract

Covariate dependent Markov models dealing with estimation of transition probabilities for higher orders appear to be restricted because of over-parameterization. An improvement of the previous methods for handling runs of events by expressing the conditional probabilities in terms of the transition probabilities generated from Markovian assumptions was proposed using Chapman-Kolmogorov equations. Parameter estimation of that model needs extensive pre-processing and computations to prepare data before using available statistical softwares. A computer program developed using SAS/IML to estimate parameters of the model are demonstrated, with application to Health and Retirement Survey (HRS) data from USA.

Share and Cite:

Chowdhury, R. , Islam, M. , Huda, S. and Briollais, L. (2012) Parameter Estimation in Logistic Regression for Transition, Reverse Transition and Repeated Transition from Repeated Outcomes. Applied Mathematics, 3, 1739-1749. doi: 10.4236/am.2012.331240.

1. Introduction

In recent times, there has been a growing interest in the applications of Markov models in various fields. In the past, most of the work on Markov models dealt with estimation of transition probabilities for first or higher orders. The use of higher order Markov chain models for discrete variate time series appears to be restricted due to over-parameterization and several attempts have been made to simplify the application. In recent years, there has been also a great deal of interest in the development of multivariate models based on the Markov Chains. These models have a wide range of application in the fields of reliability, economics, survival analysis, engineering, social sciences, environmental studies, biological sciences. Muenz and Rubinstein employed logistic regression models to analyze the transition probabilities from one state to another for first order [1]. In a higher order Markov model, we can examine some inevitable characteristics that may be revealed from the analysis of transitions, reverse transitions and repeated transitions. Islam and Chowdhury [2] extended Muenz and Rubinstein [1] model to higher order Markov model with covariate dependence for binary outcomes.

Using Chapman-Kolmogorov equations, Islam and Chowdhury introduced an improvement over the previous methods in handling runs of events which are common in longitudinal data [3]. Without loss of generality, they express the conditional probabilities in terms of the transition probabilities generated from Markovian assumptions. Their proposed model is a further generalization of the models suggested by Muenz and Rubinstein [1] and Islam and Chowdhury [2] in dealing with event history data. The proposed model is based on conditional approach and uses the event history efficiently to take account of unequal intervals in the occurrence of events.

In order to estimate parameters of the model proposed by Islam and Chowdhury extensive pre-processing and computations are needed to prepare the data before one can use the standard available procedures in existing statistical softwares [3]. In this paper we present a SAS program developed using SAS/IML to estimate parameters of the proposed model [4]. The program is demonstrated using the follow-up data on Health and Retirement Survey (HRS) from USA.

2. Model

Consider a stationary process denoting the past and present responses of the i-th subject () at the j-th follow-up (). Here is the response at time. One can think of as an explicit function of past history of i-th subject at j-th follow-up denoted by. The order of the transition model is considered as q, for which the conditional distribution of given depends on q prior observations.

Let us define the multiple outcomes by, s = 0, 1, 2, ···, m−1 if an event of level s occurs for the i-th subject at the j-th follow-up where indicates that no event occurs. The first order Markov model can then be expressed as

, (1)

where, are the m possible outcomes of a dependent variable, Y. The probability of a transition from at time to at time is. Note that

. (2)

Figure 1 presents different types of transitions from one state to another state (e.g., state 0 and state 1) for seven hypothetical subjects measured over six consecutive time points for occurrence or non-occurrence of some events (e.g., any disease) without any event at baseline. Subject one has a transition from non-event (0) at time 1 to an event (1) at time 2 and for subject two, the transition took place at time point three. We used to denote the time of occurrence of transition any time point after first time points. Subject 3 did not make any transition in all six time points, i.e. in other words remained disease free in all six measurements. We can consider it as censored case for transitions Next we consider reverse transition for those subjects who made a transition already. Subject four made a transition from non-event to an event in time 3 and remained in the same state in time 4, after that this subject made a reverse transition from an event to non-event in time point five. The time point of reverse transition is denoted by. Subject five remained in state 1 (event) for consecutive follow-ups after making a transition at time point 3 and this we can think as censored cases for reverse transitions.

Finally subjects 6 and 7 are those who already made a transition and a reverse transition and thereafter can only make a repeated transition. Subject six made a transition to event (1) in time point 2 from non-event (0) at time point 1. Then it made a reverse transition at time point 3 to non event. Again at time point 4 this subject made a transition back as event in time point 4, so we called it a repeated transition. Subject seven first made transitions in time 2 then made a reverse transition in time 3 as nonevent and remained in the same state rest of the time points and can be considered as censored for repeated transitions. The time point for repeated transition is denoted by.

Let us consider m = 3 for illustration of our method. Let the first two states be transient and the third one an absorbing state. For m = 3, we can define the following probabilities using the Chapman-Kolmogorov equations

Figure 1. Flow diagram for different types of transitions.

and also using Equation (1). The probability of a transition from u (u = 0, 1, 2) at time to v (v = 0, 1, 2) at time is

(3)

where is the time of follow-up just prior to.

The probability of a transition from u (u = 0, 1, 2) at time (just prior to the follow-up at time) to v (v = 0, 1, 2) at time (just prior to the follow-up at time) and w at time is

(4)

Similarly, the probability of a transition from u (u = 0, 1, 2) at time (just prior to the follow-up at time) to v (v = 0, 1, 2) at time (just prior to the follow-up at time) to w at time (just prior to the follow-up at time) and s at time is

(5)

It is observed that given in (3), (4) and (5) are initially first, second and third order joint probabilities, respectively. The conditional probabilities may be expressed in terms of first order transition probabilities as:

(6)

(7)

(8)

In the above conditional probabilities (6)-(8), it is assumed that once a transition is made from u to v, then the time of event u will remain fixed for all other subsequent transitions. Here a transition from u to v can happen in the second follow-up or the process can remain in the same state u in consecutive follow-ups before making a transition to v. Similarly, in case of a transition from v to w, the last observed time in state v, before making a transition to w, will remain fixed for any subsequent transition. In other words, we can allow the process to stay in the same state v in consecutive follow-ups prior to making any transition. Finally, if a transition is made from w to s then the process is observed at the last time point in the state of w, before making a transition to s. Here the time of last observing w can be different from the occurrence of w for the first time as found in expressions for (for the first observed time to transition to w and last observed times for u and v) and (for the first observed time to transition to s and last observed times for u, v and w).   

Let us define the following notations:

= vector of covariates for the ith person;

= vector of parameters for the transition from u to v.

In what follows we assumes all the individuals start at state u = 0. The probabilities of transition from state u to state v can be expressed in terms of conditional probabilities as functions of covariates as

(9)

where

Here,

.

Expressions similar to (9) may be obtained for transition from state v to state w and state w to state s, for details see Islam and Chowdhury and Islam et al. papers [3,5].

3. Estimation

The likelihood function for n individuals with i-th individual having follow-ups is given by

(10)

and (10) can be expressed as

(11)

where if a transition type (u = 0, v = 1, 2) is observed at th follow-up for the ith individual, , otherwise;, if a transition type (u = 0, v = 1, 2) is observed at th follow-up and a transition type (v = 1, w = 0, 2) is observed at th follow-up, , if a transition type (u = 0, v = 1, 2) is observed at th follow-up and a transition type (v = 1, w = 0, 2) does not occur at th follow-up; if a transition type (u = 0, v = 1, 2) is observed at th follow-up, a transition type (v = 1, w = 0, 2) is observed at th follow-up, and a transition type (w = 0, s = 1, 2) is observed at th follow-up, , if a transition type (u = 0, v = 1, 2) is observed at th follow-up, a transition type (v = 1, w = 0, 2) is observed at th follow-up, and a transition type (w = 0, s = 1, 2) does not occur at th follow-up.

From (11) the log likelihood function is given by

(12)

By equating to zero the derivatives of (12) with respect to the parameters and solving the resulting equations, we obtain the maximum likelihood estimates. The observed information matrix can be obtained from the second derivatives. We can also compute the test statistic for the model as a whole and also for individual parameters [3, 5].

Testing the Global Null Hypothesis

For illustrating the test procedure, let us suppose that all the individuals were in state 0 initially. We will get three sets of parameters, one each for transition, reverse transition and repeated transition. If we consider p variables then where here are the intercepts, k = 1, 2, 3. Then the likelyhood ratio chi square for testing the null hypothesis, is

To test the significance of the q-th parameter of the k-th set of parameters, the null hypothesis is and the corresponding Wald test statistic is

4. Computations

To explain the computation procedures we will start with a hypothetical data set. Let us consider a binary (0 = no event, 1 = event) outcome variable (i.e. outcome variable with two states) and a single binary covariate (X) from a longitudinal study with 4 follow-ups. We will get three sets of parameters, first one for transition (), second for reverse transition (), and a third for repeated transition (). It should be noted that for a multistate outcome variable, number of sets of parameters will increase accordingly [6].

Table 1 gives the hypothetical data on 7 cases. The value of the outcome variable of third follow-up of case 7 (Case ID = 7) is missing and is coded as 99 in the data. Also the value of the outcome variable for this case for the rest of follow-ups will be considered as missing in the data. It should be noted that we have started with only those cases that were in state 0 at follow-up 1. Suppose we have a total of four outcome variables, one for each follow-up. Next we need to find out what are the possible combinations of the values of the outcome variables which will identify the occurrence or non occurrence of an event for transition, reverse transition and repeated transition. Let us explain what we mean by a combination here. For example, case 3 was in state 0 at follow-up 1 and changed its status to state 1 at follow-up 2 (0 → 1). Hence an event took place for this case which we termed as combination (this combination can be viewed like a covariate pattern for four outcome variables from four follow-up) of 0 → 1 and we identified it as a transition [7]. We do not need to worry about the status of this case for follow-up 3 and follow-up 4. If any case remains (e.g., Case 2) in same state for all of the remaining follow-ups as it were in follow-up 1, (0 → 0 → 0 → 0) then this case did not observe any event. Also we have to find out the corresponding covariate value from where a transition or reverse or repeated transition took place.

From the data in Table 1 we have to create three sets of data, one each for say, Set 1 (Transition), Set 2 (Reverse Transition), and Set 3 (Repeated Transition). The created data sets shall include a single binary outcome variable (e.g., “Estatus”) which will identify whether an event occurred (1) or not occurred (0) for each of these three parameter sets and the covariate (X) by taking the value from appropriate follow-ups. In addition we have to create another variable which will identify which cases are for which parameter set (e.g., “TranType”).

To create the new data set with two new variables in addition to the covariates, first we need to identify which cases observed the event for single outcome variable (Estatus) for three sets of data namely Set 1, Set 2, and Set 3. Table 2 shows the possible combination of the value of binary outcome variables of occurrence or non occurrence of events for Set 1 (Transition), Set 2 (Reverse Transition), and Set 3 (Repeated Transitions) with

Table 1. Hypothetical data set with four follow-ups.

Table 2. Combinations of outcome variable for identification of occurrence of events for transition, reverse transition and repeated transitions.

four follow-ups including missing values. There are in total seven possible combinations of outcome variable with four follow-ups (Table 2) for Set 1. First three combinations will identify the occurrence of an event and coded as 1 for the event status column (Estatus). Remaining four combinations will identify the non occurrence of an event for Set 1 and coded as 0 for the event status column (Estatus). Combinations from 5 to 7 with missing values are also considered as non occurrence of an event for Model 1 and are also coded as 0 for the event status column (Estatus). For the first combination in Model 1 for covariate (X) we have to take the corresponding covariate (X) value from follow-up 1, because the event for this transition was originated from followup 1. For combination 2 the corresponding covariate (X) value will be from follow-up 2, and so on. In case of combination four, “no event” was observed. Hence for this case we have to take the covariate value from last follow-up i.e. follow-up 4. For combinations 5 to 6 we have to take the covariate value from first, second and third follow-up, respectively, i.e., the follow-up just prior to the value being missing. The value of transition type (TranType) column is coded as 1 for all of the combinations corresponding to Model 1. Sequence number in (TranCode) column identifies the unique combinations for Set 1.

For Set 2, again we have a total of eight combinations to identify occurrence or non occurrence of events. It is evident that only those cases who observed the occurrence of an event in Set 1 (i.e. made a transition) will be in Set 2 (i.e. can make a reverse transition). First two combinations observed an event (reverse transition) after observing a transition in Set 1 and coded as 1 for the event status column (Estatus) in Set 2. Hence these two combinations will be considered as an occurrence of an event for Set 2. Third to fifth combination in Set 2 did not observe any event after making a transition and will be considered as the non occurrence of an event for Set 2 and coded as 0 for the event status column (Estatus). Sixth and seventh combinations for Set 2 are also considered as non occurrence of an event for this model due to missing observations after making a transition and are also coded as 0 for the event status column (Estatus). The covariate (X) value for Set 2 for first two combinations will be from follow-up 1 and follow-up two, respectively. The covariate (X) value for third to fifth combinations will be from fourth follow-up, since cases with these combinations did not change the state after making a transition. In case of missing data for outcome variable the covariate value for observation six to eight will be from second and third follow-up, respectively. The value of transition type (TranType) column is coded as 2 for all of the combinations corresponding to Set 2. Sequence number in (TranCode) column identifies the unique combinations for Set 2.

Finally for Set 3 (Repeated Transitions) we have five possible combinations from four outcome variables for four follow-ups. Again only those cases who have observed an event for reverse transition will contribute to Set 3. First combination for Set 3 observed an event after observing a transition and then reverse transition, hence is considered as an event for Set 3 (Repeated Transition) or we can say that a repeated transition took place and coded as 1 for the event status column (Estatus). The covariate (X) value for this combination will come from follow-up 3, because the repeated transition was originnated from that point. Second and third combinations did not observe any event after making a reverse transition hence are considered as non occurrence of an event for Set 3 and coded as 0 for the event status column (Estatus) in Table 2. The covariate value for second combination will be from last follow-up as usual and for the third combination will be from third follow-up due to missing value in last follow-up. Fourth and fifth combination will also be considered as non occurrence of event for Set 3 and the corresponding covariate (X) value will come from fourth follow-up. The value of transition type (TranType) column is coded as 3 for all of the combinations corresponding to Set 3. Sequence number in (TranCode) column identifies the unique combinations for Set 3.

Now we can match these combinations of outcome variables of the follow-ups for each case in the data (Table 1) with the combinations for transition, reverse transition and repeated transitions presented in Table 2. For combination 1 in Set 1 (Table 2) we need to match the value for first two follow-ups of the data (Table 1) only. For combination 2 we need to match only with first three follow-ups from the data and so on. Since we created the combinations (Table 2) we can also identify the number of follow-ups to match from the data i.e., the starting follow-up and ending follow-up. For example, for combination 1 in Set 1 the starting follow-up is the first and ending follow-up is the second and so on. Similarly we will be able to identify the appropriate follow-up from where the covariate value should be taken.

Table 3 shows the new data set created by using procedure discussed above for creating data set for Set 1, Set 2 and Set 3. First column in Table 3 (Case ID) gives the case identification. Second column (TranCode) shows which combination was matched by this particular case. Third column (TranType) represents transition types where 1 for transition (Set 1), 2 for reverse transition (Set 2) and 3 for repeated transition (Set 3). Fourth column (Estatus) represents the occurrence or non occurrence of events as discussed earlier. In Set 1 (Transition), five cases observed an event while remaining two did not and coded accordingly in fourth column (Estatus). Those five

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] L. R. Muenz and L. V. Rubinstein, “Markov Models for Covariate Dependence of Binary Sequences,” Biometrics, Vol. 41, No. 1, 1985, pp. 91-101. doi:10.2307/2530646
[2] M. A. Islam and R. I. Chowdhury, “A Higher-Order Markov Model for Analyzing Covariate Dependence,” Applied Mathematical Modelling, Vol. 30, No. 6, 2006, pp. 477-488. doi:10.1016/j.apm.2005.05.006
[3] M. A. Islam, R. I. Chowdhury and S. Huda, “A Multistate Transition Model for Analyzing Longitudinal Depression Data,” Bulletin of the Malaysian Mathematical Sciences Society, 2012. http://www.emis.de/journals/BMMSS/accepted_papers.htm
[4] SAS/IML 9.1, “User’s Guide,” SAS Institute Inc., Cary, 2004.
[5] M. A. Islam, R. I. Chowdhury and S. Huda, “Markov Models with Covariate Dependence for Repeated Measures, Chapter 9,” Nova Science, New York, 2009.
[6] M. A. Islam, R. I. Chowdhury and S. Huda, “A Multistage Model for Analyzing Repeated Observations on Depression in Elderly,” Festschrift in Honor of Distinguished Professor Mir Masoom Ali, 18-19 May 2007, pp. 44-54.
[7] D. W. Hosmer and S. Lemeshow, “Applied Logistic Regression,” Wiley, New York, 1989, p. 136.
[8] Public Use Dataset, “Health and Retirement Study,” University of Michigan, Ann Arbor, 1992-2004.
[9] M. A. Islam, “Multistate Survival Models for Transitions and Reverse Transitions: An Application to Contraceptive Use Data,” Journal of Royal Statistical Society A, Vol. 157, No. 3, 1994, pp. 441-455.
[10] M. A. Islam, R. I. Chowdhury, N. Chakraborty and W. Bari, “A Multistage Model for Maternal Morbidity during Antenatal, Delivery and Postpartum Periods,” Statistics in Medicine, Vol. 23, No. 1, 2004, pp. 137-158. doi:10.1002/sim.1594

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.