A Comparative Analysis of Generalized Estimating Equations Methods for Incomplete Longitudinal Ordinal Data with Ignorable Dropouts

In longitudinal studies, measurements are taken repeatedly over time on the same experimental unit. These measurements are thus correlated. Missing data are very common in longitudinal studies. A lot of research has been going on ways to appropriately analyze such data set. Generalized Estimating Equations (GEE) is a popular method for the analysis of non-Gaussian longitudinal data. In the presence of missing data, GEE requires the strong assumption of missing completely at random (MCAR). Multiple Imputation Generalized Estimating Equations (MIGEE), Inverse Probability Weighted Generalized Estimating Equations (IPWGEE) and Double Robust Generalized Estimating Equations (DRGEE) have been proposed as elegant ways to ensure validity of the inference under missing at random (MAR). In this study, the three extensions of GEE are compared under various dropout rates and sample sizes through simulation studies. Under MAR and MCAR mechanism, the simulation results revealed better performance of DRGEE compared to IPWGEE and MIGEE. The optimum method was applied to real data set.


Introduction
In the medical, epidemiological and social sciences, studies are often designed to investigate changes in the response of interest observed or measured over time on each subject.These are called repeated measures or longitudinal studies.Since the measurements are taken repeatedly over time on the same experimental unit, then the data are typically correlated.Ordinal responses are regularly experienced in these studies.It is exceptionally common for sets of longitudinal studies to be incomplete, in the sense that not all intended measurements of a subject outcome vector are actually observed.This turns the statistical analysis into a missing data problem.When data are incomplete, a number of issues arise in the analysis: 1) the issue of bias due to systematic differences between the observed measurements and unobserved data, 2) loss of efficiency and 3) complications in data handling and statistical inferences [1].
The issues of missing data are frequently encountered in longitudinal studies in the sense that nonresponse can happen any time from the beginning of the study.Two patterns of missing data can be observed for the response: 1) dropout (monotone pattern of nonresponse), in which an individual terminates the study prematurely from a scheduled sequence of visits for a number of reasons (both known and unknown), or 2) intermittent nonresponse, in which a subject returns to the study after occasions of nonresponse [2].The reasons for missigness are varied and it is fundamental to know the missing data mechanism generating nonresponse and its impact on inferences.Rubin [3] argued that there are two important broad classes of missing data: missing data that is ignorable from the analysis, and missing data that is non-ignorable (missing not at random).If missing data occur under either missing completely at random or missing at random conditions, the problem is deemed ignorable, and the missingness process need not be explicitly modelled.A nonresponse process is missing completely at random (MCAR) if the probability of being missing is independent of both unobserved and observed measurements.Data are said to be missing at random (MAR) if, nonresponse is independent of the unobserved quantities given the observed data and missing not at random (MNAR) when the nonresponse depends on unobserved quantities.
A lot of research has been going on ways to appropriately analyze longitudinal studies.When data is incomplete, rather than deleting missing values, it has been recommended to "impute" them [4].The subject of how to obtain valid inferences from imputed data was formally addressed by Rubin [5] who introduced the multiple imputation (MI) method as an approach to handle missing data.MI has become one of the most popular approaches in handling incomplete data and it is applicable when the data are MAR or MCAR.MI method replaces each of the unobserved values with 2 m ≥ plausible values to obtain m completed datasets, whence reflecting the uncertainty about the missing data.The m completed datasets are then analysed separately using standard complete data methods and finally, the results from the m analysis are combined into a single inference.
Alternative solutions of handling longitudinal missing data have been explored, in particular, the Generalized Estimating Equations (GEE) method [6], which is quite popular for the analysis of non-Gaussian correlated data.Its main advantage is that one is only required to specify correctly the mean structure of the response for the parameter estimator to be consistent and asymptotically normal.In the presence of missing data, GEE is only valid under the strong assumption of MCAR.The first effort to make GEE applicable to the more realistic MAR scenario was Multiple Imputation Generalized Estimating Equations (MIGEE), proposed by Little and Rubin [7].Here, missing values are multiply imputed and the resulting completed datasets are analysed through standard GEE methods.Following Rubin's rule, the final results obtained from the completed datasets are combined into a single inference.Robins [8] extended GEE be developing the Inverse Probability Weighted Generalized Estimating Equations (IPWGEE), which consists of weighting each subject's contribution in the GEE by the inverse probability that a subject drops out at the time they dropped.IPWGEE produces consistent estimates provided the weight model is correctly specified.Double Robust Generalized Estimating Equations (DRGEE) arise as a third generalisation of GEE to deal with data subject to MAR mechanism.The main idea is to supplement the IPWGEE with a predictive model for the missing quantities conditional on the observed ones [9].This method produces consistent estimates provided the dropout or conditional model is correctly specified.Doubly robust methods have widely received attention in the literature in the last decade (see [10] [11] [12] [13]).
Literature of GEE for missing data for longitudinal ordinal response is comparatively scarce.In Toledano and Gatsonis [14], the authors used a weighted GEE method to accommodate intermittent nonresponse of an MCAR missing response and missing covariate that is MAR.In a simulation study, authors in [15] compared ordinal imputation regression and multivariate normal imputation for ordinal outcome subject to dropout.A paper from Kombo [16] compared through a simulation study two multiple imputation methods (multivariate normal imputation and fully conditional specification) for longitudinal ordinal data with monotone missing data patterns.The aforementioned papers used single robust versions of GEE and they have treated only a missing MAR response or missing MAR covariate.In a paper by da Silva [13], the authors used DRGEE method for ordinal data with intermittently missing response and missing covariate.Therefore the use of DRGEE, IPWGEE and MIGEE methods for ordinal data with monotone missing pattern has been in need for further development.
In this paper, our main interest is the comparison of GEE methods in handling incomplete longitudinal ordinal outcomes when missing response is ignorable.This assumes the missing data are either MCAR or MAR.Comparisons are made by means of simulation study and the optimum model is applied to a real dataset.Through simulation study, the behavior of the methods in terms of mean squared error (MSE) and bias of the estimators are extensively studied, under correctly specified models.
This paper is organised as follows.Section 2 gives necessary notation and key definitions.Section 3 outlines the GEE, as well as IPWGEE, MIGEE and DRGEE approaches.A simulation study is presented in Section 4 followed by a simulation results and application in Section 5. Finally, discussion and concluding remarks are provided in Section 6.

Ordinal Outcomes
Categorical variables occur frequently in many studies including but not limited to economic, health, education fields.In cases where the variables is categorical with only two levels, logistic regression take stage.However, in cases where there are more than two categories and the categories are ordered then polytomous ordinal regression come into play.
Ordinal outcomes are regularly experienced in longitudinal studies, particularly in randomized clinical trials.Apart from failing to meet the usual normality assumption for analysis and inference, these data are prone to missingness.Failure to deal with incomplete information jeopardizes the validity of inferences.Various authors [17] [18] [19] have studied a number of logistic regression models for ordinal responses variables.When considering several factors, special multivariate analysis for ordinal data is the best option [20], even though other methods like mixed models can be employed.Nevertheless, ordinal logistic regression models have been found to be most useful when dealing with ordinal data [19].There are several ordinal logistic regression models namely; the proportional odds model, continuous ratio model, partial proportional odds model and the stereotype regression model.Among the aforementioned ordinal logistic regression models, the most common is the proportional odds model [21].The proportional odds model is a logit model that allows ordered data to be modelled by analysing it as a number of dichotomies [16].It compares a number of dichotomies by arranging the ordered categories into a series of binary comparisons.The proportional odds assumption states that the effect of each covariate is the same for each binary comparison (logit).The assumption is regularly used with the cumulative logit link.

Missing Data in Longitudinal Studies
Suppose that longitudinal data consists of N subjects and let ij Y be an ordered variable for subject i with C categories assessed at j th occasion ( ) , where

I
is the indicator function equal to one when the argument is true and zero otherwise.Let ( ) denote the vector of repeated measurements of the i th subject.
Associated with each subject, there is a vector of covariates, say ij X , measured at time j.Let ( ) be the covariates matrix for ith subject.The marginal distribution of ij Y will have a multinomial distribution such that: where , is the probability of being at category c at time j given a set of covariates and ( ) is a vector of regression parameters.The cumulative proportional odds model is a popular choice to model ijc µ [19].Specifically, the cumulative logit model is given as where 0 β is the vector of intercept parameters and x β is the vector of coefficients and does not the depend on c.
( ) , , where ( ) f Y X θ denotes the marginal density of the measurement process, ( ) denotes the missing data model whose parameter are contained in ψ .ψ is an unknown parameter governing the missing data mechanism and θ denotes the vector of parameters describing the response variable.The distribution of i R may depend on i Y .In terms of probability, we may define these distributions such that the data is said to be MAR if

(
) ( ) In this paper, our main interest is on missing data due to dropouts.For all components of ij Y that are not observed, the corresponding components of ij R will be 0. We can then replace the vector i R by a scalar variable i D , the drop out indicator, commonly defined as: i D denotes the time at which subject i dropped out.The model for drop outs process can therefore be written as where i d is the realisation of the variable i D .In Equation (4), it is assumed that all subjects are observed on the first occasion so that i D takes values between 2 and ( ) T + .The maximum value ( ) corresponds to a complete measurement sequence.

Generalized Estimating Equations
The GEE approach has its roots in the quasi-likelihood methods introduced by Open Journal of Statistics Wedderburn [22] and later developed and extended by McCullagh and Nelder [23].GEE is a general statistical approach to fit a marginal model for longitudinal data analysis in clinical trials or biomedical studies.This method has computational simplicity and marginal parameter estimation.The method estimates model parameters by iteratively solving a system of equations based on extended quasi-likelihood where the extension to the generalized linear model is towards incorporating correlations.
Suppose that longitudinal data consists of N subjects.For subject ( ) , there are T observations and let ij Y denote the j th response ( ) , and let ij X denote the 1 p × vector of explanatory variables.Suppose ( ) , , ,  denote the corresponding column vector of response variable for the i th subject with the mean vector , , , where ij µ is the corresponding j th mean.The marginal model specifies that a relationship between and the covariates ij X is as follows: ( ) where g is a link function and β is the vector of regression parameters.On the other hand, the conditional variance of ij Y given ij X is given as , where φ is a scaling parameter and ν is a known variance function of ij µ .Based on Liang and Zeger [6]; Lipsitz [24], the generalized estimating equations has the form ( ) ( ) where ′ β denotes a transpose vector of marginal regression parameters β , ( ) ( ) i R α is a "working" correlation matrix that expresses the marginal correlation between repeated measures and α is a vector of noises which may be handled by the introduction of the working correlation structure such as independence, autoregressive of the first order (AR(1)), exchangeable, or unstructured.For AR (1) the correlations decline exponentially between measures i.e.

( )
Corr , In the independence, the identity matrix serves as the working correlation matrix.On the other hand, for exchangeable structure the correlation between any two measures are assumed to be the same regardless of the time from one period to the next.
Under unstructured case, every pair of measurements is given its own association parameter.
Under mild regularity conditions and correct specification of the marginal mean i µ , Liang and Zeger [6] showed that the estimator β , obtained by solving Equation (7), is consistent and a multivariate normal with mean vector 0 and covariance matrix given by where ( ) and , where i ′ µ in Equation ( 9) denotes a transpose mean vector of i µ .In practice, the "sandwich" covariance matrix V β in Equation ( 8) is calculated by ignoring the limit and replacing β and α by their estimates, and also

Multiple Imputation Generalized Estimating Equations
This method is a simulation-based approach that imputes missing values multiple times [5].The main idea of the procedure is to replace each missing value with a set of M plausible values drawn from the conditional distribution of the unobserved values given the observed ones.This conditional distribution represents the uncertainty about the right value to impute.In this way, M imputed datasets are generated (imputation stage), which are then analysed using standard complete data methods (analysis stage).Finally, the results from the M analyses have to be combined into a single inference (pooling stage) using Rubin [5] rules.
Let ˆk β and k U be the estimate of a parameter of interest β and its covariance matrix from the k th completed data set, ( ) According to Little and Rubin [7], the combined point estimate for the parameter of interest β from the MI is simply the average of M complete-data point estimates: and an estimate of the covariance matrix of β is given by where ( )( ) here, W measures the within-imputation variability and B measures the between-imputation variability.
As Schafer [26] expressed, MI can be used to create the imputations from a fully parametric model.After drawing the imputations, one analyses the imputed datasets by a semi-parametric or non-parametric estimation procedure to achieve better performance and greater robustness.In the context of binary outcomes, [27] [28] [29] used MI to fill in missing values for GEE analysis in data that are MAR.So GEE can be used after MI, leading to a hybrid technique named MIGEE [26].Typically, the missing data mechanism can be further overlooked given that the MAR is valid.

Inverse Probability Weighted Generalized Estimating Equations
When data are incomplete, GEE suffers bias from its frequentist nature and it is generally valid only under the strong assumption of MCAR [1].Robins [8] proposed a class of weighted generalized estimating equations, effectively to remove bias and provide valid statistical inferences to regression parameter estimates for marginal models in the incomplete longitudinal data scenario by allowing it to be MAR.This method requires specification of a dropout model in terms of observed outcomes and/or explanatory variables.The idea behind IPWGEE is to weight each subject's contribution in the GEEs by the inverse probability that a subject drops out at the time they dropped out.Such a weight can be expressed as where 2,3, , indicator for the subject i, where represents that subject i completes all the T visits, which were set prior by design.
In the IPWGEE approach, GEE estimator for β is based on solving the equation: ( ) ( )  is a diagonal matrix of event specific weights.
A consistent estimator for β can be obtained by solving Equation ( 13), under the correct specification of the missing data model.Following [30] the score equations to be solved are: Estimators from IPWGEE enjoy robustness properties similar to the ones from ordinary GEE, that is., the correlation structure does not need to be correctly specified.

Double Robust Generalized Estimating Equations
The doubly robust method is an alternative approach that uses the inverse probability weights (IPW) to refine estimates of the model parameters [11], within a GEE analysis.In this method, there is a need for the specification of two models: 1) the first model is on the distribution of the complete data which include both the outcome and covariates, and 2) a model for the missingness mechanism.The doubly robust (DR) estimating equations method has been developed as an extension of the WGEE method, where the idea is to integrate the weights with the use of a predictive imputation model for the missing data given the observed data.Equation ( 13) has been extended toward so called robustness [11] [31].
γ , to the inverse probability weighted estimators would still result in consistent estimates under a MAR mechanism.These augmented equations give rise to doubly robust estimators.Chen and Zhou [12] noted that the optimal opt γ for missing response is given by where 1 and 1' is a vector of 1's of length i T and its transpose respectively, and m i Y denote the missing component of i Y .Undefined variables and parameters in Equation ( 15) are as defined before in section 0. The parameters β are estimated by solving the estimating equations, ( ) ( ) ( ) The estimator for β in Equation ( 16) is doubly-robust in the sense that it is consistent if at least one of the missing data models is correctly specified.In current application, we combine inverse probability weighting (IPW) with MI and the GEE as the analysis to construct DRGEE.The robustness of the imputation model is enhanced by ensuring necessary information is included in the model, while avoiding the bias from the final inference.
The aim of the DRGEE estimation; is to estimate the propensities for each incomplete variable conditional on the other variables, and impute the missing values on that variable by the inclusion of propensity functions (i.e.IPW) into the imputation model.Finally, the results of the analysis from M completed (imputed) data are combined into a single inference using Rubin [5] rules.The expectation of this method is to be readily robust, and by design it is aimed at handling incomplete data with any pattern of missingness.

Data Generation and Simulation Designs
We simulated data in order to mimic an ordinal longitudinal clinical trial data.
We simulated 1000 datasets based on the marginal model (17)  x represent exposure period.The true marginal model is ( ) where the model parameters are ( ) were generated using the NORTA method [32] with a constant correlation between the latent vectors as 0.9 ρ = .This method uses the probability integral transformation to transform a d-variate normal random vector to the desired multivariate distribution with specified marginals and correlation matrix.
Probability integral transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution.We used the R package SimCorMultRes [32] which makes it easy to simulate correlated categorical responses under the marginal model (17).The package implements marginal models for correlated binary responses as well as for correlated multinomial response categories taking into account the nature of response categories (ordinal or nominal).
For comparison purposes, standard GEE was considered to analyse the full datasets.Each estimate is an average of 1000 estimates from the different simulated datasets.After analysing the full data set we then create the dropouts.
Dropouts were created on the complete simulated datasets using different settings of missingness rate on response variable ij Y and according to the MCAR or MAR missing mechanism.
The dropout model is based on a logistic regression for the probability of dropout at occasion j, given that the individual was in the study up to occasion 1 j − .This probability is denoted by ( )

P h y , and the outcome history
, , . In this study, the assumption is that dropout depends only on the current observed measurement ij y and the immediately preceding measurement , 1 i j y − .We therefore assume that dropout process is modelled by a logistic regression of the form with 0 ψ denoting the intercept of regression, 1 ψ and 1 ψ are respectively the coefficients of , 1 i j y − and ij y .The model (18) reduces to a MAR if 2 0 ψ = (i.e. the missingness process is related to the observed outcome prior to dropout) and In both MAR and MCAR settings, after simulating a data set without missing data, we adopted the following strategy.We assume that dropout can occur after the first time point.Thus in this study, four dropout patterns are possible, i.e., 1) dropout at the second point time, 2) dropout at the third time point, 3) dropout at the fourth time point, 4) no dropout.
According to Satty [28], the data generated at time j and the subsequent times were assumed to be dependent on the outcome measure at time j.The true dropout model is written as: ( ) We further consider a second data generating model, GM II, in which the outcomes are generated based on model ( 18) and random missingness is induced via the following MCAR logistic regression model: ( ) where 2,3, 4 j = , ( ) After creating the dropouts, the incomplete data sets were analysed using the three (3) extensions of GEE namely; MIGEE, IPWGEE and DRGEE.The performances of these methods were assessed in terms of mean squared error (MSE) and bias.

Performance Measures for Evaluating Different GEE Methods
In the evaluation, inferences are drawn on the complete data before the dropouts are created.Complete-data results are used as the standard against which those obtained from applying IPWGEE, MIGEE and DRGEE approaches are compared.R software [33] was used to perform statistical analysis and to produce the results.
The performance of the three methods were evaluated using bias and mean squared error(MSE).These criteria were recommended in [34] and [35].First we defined the bias as Bias , where β is the true value for the estimate of interest, is the average estimate of interest, S is the number of simulation replications performed, and ˆs β is the estimate of interest within each of the 1, , s S =  simulations.The mean squared error (MSE) was given by ( ) where ( )

ŜE
β denotes the empirical standard error (SE) of the estimate over all simulations [35].SE is calculated as the standard deviation of the estimates of interest from all simulations ( ) ( )

Simulation Results and Analysis
In this section, we discuss the result of simulation study that compares the three techniques namely; MIGEE, IPWGEE and DRGEE for different sample size and different missingness rates on the response variable.The measurement at first time point were assumed to be observed for each individual.Note that the primary focus was to compare MIGEE, IPWGEE and DRGEE, but we extend the results to include those obtained from full datasets using standard GEE.The imputation model considered here is the imputation using chained equations [37], with the number of multiple imputation set to 5 M = .This number of imputations was chosen to account for the fraction of missing information and to get efficient parameter estimates.We incorporate weights to analyze the IPWGEE.The simulation study also considers the correct specified model for the imputation model for both the MIGEE and DRGEE.We considered a correct propensity score model for DRGEE.The logistic regression was used to estimate the propensity scores for the DRGEE, which was then used in the imputation model.The incomplete data set were multiply imputed and analyzed by MIGEE and DRGEE techniques respectively.
A better method is expected to produce parameter estimates closer or similar to the true values, hence yielding small bias.Likewise, a small MSE denotes a better or precise method.Results are presented in Tables 1-3 for 8%, 25% and 33% dropout rates respectively, under MAR mechanism.For MCAR mechanism, results are presented in Table 4 and Table 5    Generally, the bias was negligible for all methods showing asymptotically parameter estimates.In sum, although all methods performed equally well in terms of bias and MSE, DRGEE provided better parameter estimates than the single robust counterparts.

Simulation Results for MCAR Missing Data
In Table 4

Application to a Real Dataset
The dataset used is from a homoeopathic clinic in Dublin, made available in [38].The data was collected from 60 patients who were suffering from arthritis.
There were 12 males and 48 females between the ages of 18 and 88 years in the study.These patients were followed up for a month (in 12 visits).Pain scores was assessed during a monthly followup and it was graded from 1 to 6 (high indicating worse pain score recorded).Out of 60 patients only two had all scores for the 12 visits.At initial visit, baseline information were recorded, such as age, sex (male/female), arthritis type (RA = rheumathoid arthritis, OA = ostheo-arthritis), and the number of years with the symptom.All patients were under treatment for arthritis, and only those with a baseline pain score greater than 3 and a minimum of six visits are reported.
We think the MAR mechanism may be reasonable because, for instance, a patient's visit to a clinic may depend on his/her previous observed pain score: if s/he scored a high pain score on his/her last visit, s/he may be likely to attend the next visit to treat the disease efficiently.Both monotone dropouts pattern and nonmonotone missingness were observed in the data.The amount of monotone dropouts was considerable (33.8%), while that of nonmonotone missigness was much smaller (1.8%).Overall, approximately 36% of the pain score data were missing/not observed.Some descriptive statistics of the dataset are summarized in Table 6.
For the ordinal response scale, we used the following proportional odds model where ij Y is the pain score status of the i th patient at j th visit, ij x is the covariate vector at time j.Here, the covariate vector is formed by Sex, Age, Time, Type and Years.
DRGEE was applied to the real dataset.The reason why we chose DRGEE as an optimum method was: 1) simulation results showed that it performed better than MIGEE and IPWGEE under MAR and 2) MAR mechanism was observed in the arthritis data.When dealing with DRGEE it is necessary to correctly specified inverse probability weighting and imputation model, in order to obtain consistent estimates of β .The weights were based on a logistic regression model for dropout: where ij v include sex, age, type, history of observed pain scores.Here, if the pain score was observed and 0 otherwise.We incorporate weights obtained in Equation ( 24) in the imputation model, in order to get double robust estimates.Available data was analysed without alteration or any attempts to impute data missing on the response variable.This was under ordinary GEE.
Results from the two approaches are shown below in Table 7.The first one is the usual GEE method using the available data and the second method is DRGEE.produced by usual GEE.Overall, it can be seen that there is gain in using DRGEE method due to its doubly robust property.

Discussion and Conclusion
In this paper, the focus was to compare three techniques for handling incomplete ordinal outcome based on GEE under MCAR and MAR dropouts in longitudinal data.Three methodologies were used, namely: multiple imputation, inverse probability weighting and its doubly robustness counterpart.First, dropouts were created at different rates on simulated datasets of various sample sizes and the three methods were applied to these incomplete datasets.Then the optimum method was used on the Arthritis data as an application to real data.
The dropout rates in simulated data were diverse, ranging from 8% to 33% with the aim to investigate the performance of the approaches when different amount of data are missing.The sample sizes were varied to see how these methods will behave.The performances of the three approaches were evaluated in terms of mean squared error and bias.
For multiple imputation, we make sure that the imputed values bore the structure of the data, uncertainty about the structure and included any knowledge about the process that led to the data missing [37].An important aspect in the case of IPWGEE is the specification of the model for missingness to construct the weights (IPW) for the subjects.These probabilities must be hemmed away from zero as to avoid trouble of division by zero [28] [39].
Double robust method combines ideas from weighting and imputation and has been applied elsewhere for estimation of means, casual inference and in the context of longitudinal binary response data [10] [12].
Generally, the results from simulation study showed that all the methods can be satisfactorily used for incomplete ordinal outcomes with the assumption of MAR and MCAR mechanism.It is worth mentioning that almost all methods that are valid under MAR hold under MCAR.This is because MCAR is a special case of MAR.Consequently, ignoring missigness under MCAR will not introduce systematic bias, but will increase the standard error of the sample estimates due to the reduced sample size [40].For this reason, MCAR poses less threat to statistical inferences than MNAR or MAR.
Specifically, when we consider both bias and MSE, a better performance was observed for DRGEE over single robustness alternatives MIGEE and IPWGEE in the simulation study.This is consistent with the results reported in [10] [13].
DRGEE is more powerful or appealing because of its doubly robust property compared to single robust counterparts.Considering the performance of MIGEE and IPWGEE, the findings generally favoured MIGEE over IPWGEE.This agrees with the theoritical results in that IPW can be less powerful and efficient than Bayesian approach like MI under a well specified parametric model, see [36].In view of previous work on the comparison between MIGEE and IPWGEE, it has been found by other researchers that MIGEE provides more efficient results over IPWGEE in longitudinal binary data [27] [28].Nevertheless, the misspecification of imputation model cannot be disregarded in practice and biased results can be expected when the imputation model is incorrect [37] [41].
On the Arthritis data application, the predictive model was correctly specified and this made the doubly estimates have a great potential of reducing bias when the MAR assumption is correct.In this study, missing values were only on the response variable.However, this does not limit the applicability of DRGEE, MIGEE and IPWGEE to that case only.These methods can be extended to situation where missing values are on the response and covariates variables.It is also important to note that DRGEE, MIGEE and IPWGEE all rely on the assumption that the missingness is MAR (and hence necessarily under MCAR).Typically, the possibility that the missing mechanism is MNAR cannot be ruled out.Whence, caution should be exercised in interpreting results from any of these procedures.Under MNAR, researchers are always encouraged to do sensitivity analysis [42] [43].
In conclusion, based on the results of this simulation, the DRGEE is recommended because consistency is guaranteed under the MAR (and hence necessarily under MCAR) if at least one of the missing data models is correctly specified.It became clear that the IPWGEE method does not always yield the best results, even if the MAR mechanism holds.In addition, it is advisable to include few and necessary auxiliary variables when constructing weights for individuals, while too many variables can be harmful.For instance, when the number of individuals is small, we run the risk of giving too much weight to one specific subject.
How to cite this paper: Ditlhong, K.E., Ngesa, O.O. and Kombo, A.Y. (2018) A Comparative Analysis of Generalized Estimating Equations Methods for Incomplete Longitudinal Ordinal Data with Ignorable Dropouts.Open Journal of Statistics, 8, 770-792.https://doi.org/10.4236/ojs.2018.85051 that the id w are correctly specified, IPWGEE provides consistent estimates of the model parameters under a MAR mechanism.
for random sample sizes 100,300 N = and 500.We consider a study with 4 i T = repeated ordinal measures (with four categories) and two covariates (one binary and the other continuous).For binary covariate ( 1 x ) individuals were assumed to have Open Journal of Statistics been assigned to two treatment arms (Higher dose = 1 and Mild dose = 0) and 2 . The correlated ordinal response generate different dropout rates.The combination of this MAR logistic dropout model with the measurement model(18) defines our data generating model, which is hereinafter referred to as GM I.

.β
Alternatively, the average of the estimated within simulation SE for the estimate of interest denotes the standard error of the estimate of interest within each simulation.Normally, small values of MSE are desirable[36].

Table 2 .
Bias and MSE estimates from MIGEE, IPWGEE and under MAR mechanism for 1000 simulations of incomplete data of sizes: N = 100, 300, 500.

Table 3 .
Bias and MSE estimates from MIGEE, IPWGEE and DRGEE under MAR mechanism for 1000 simulations of incomplete data of sizes: N = 100, 300, 500.

Table 4 .
Bias and MSE estimates from MIGEE, IPWGEE and DRGEE under MCAR mechanism for 1000 simulations of incomplete data of sizes: N = 100, 300.

Table 5 .
Bias and MSE estimates from MIGEE, IPWGEE and DRGEE under MCAR mechanism for 1000 simulations of incomplete data of size: N = 500.

Table 3 ,
with a 33% dropout rate, for sample 100 and 300, the previous trend for both bias and MSE in Table 2 are repeated.Comparing MIGEE and DRGEE, for all samples, the trends are largely similar to what was observed in

Table 1 .
As expected, it can be seen that in most cases IPWGEE was more biased compared to the MIGEE and DRGEE.In addition, IPWGEE has larger MSE values than the other methods.It can be seen that for sample size 300, MIGEE performed better than DRGEE for different dropout rates, except for 25% dropout rate where MIGEE was better than DRGEE for sample size 300 and 500.
, under the sample size of 100, we notice that DRGEE produced smallest values of bias showing asymptotically unbiased estimates, except for

Table 6 .
Descriptive statistics for arthritis data.

Table 7 .
Parameter estimates (Est), standard errors (SE) and p-value obtained from Arthritis data.
− = in the DRGEE method).Both methods provided the same conclusion for the effect of Age.That is, each unit increase in Age, the odds of feeling mild pain or minimal pain decreases by 3% (for instance, in DRGEE it is 0.0278 e 0.9725 − = ).Furthermore, Open Journal of Statistics the standard error produced by DRGEE are marginally smaller than one