^{1}

^{*}

^{2}

An improved method for estimation of causal effects from observational data is demonstrated. Applications in medicine have been few, and the purpose of the present study is to contribute new clinical insight by means of this new and more sophisticated analysis. Long term effect of medication for adult ADHD patients is not resolved. A model with causal parameters to represent effect of medication was formulated, which accounts for time-varying confounding and selection-bias from loss to follow-up. The popular marginal structural model (MSM) for causal inference, of Robins et al . , adjusts for time-varying confounding, but suffers from lack of robustness for misspecification in the weights. Recent work by Imai and Ratkovic [1][2] achieves robustness in the MSM, through improved covariate balance (CBMSM). The CBMSM (freely available software) was compared with a standard fit of a MSM and a naive regression model, to give a robust estimate of the true treatment effect in 250 previously non-medicated adults, treated for one year, in a specialized ADHD outpatient clinic in Norway. Covariate balance was greatly improved, resulting in a stronger treatment effect than without this improvement. In terms of treatment effect per week, early stages seemed to have the strongest influence. An estimated average reduction of 4 units on the symptom scale assessed at 12 weeks, for hypothetical medication in the 9 - 12 week s period compared to no medication in this period, was found. The treatment effect persisted throughout the whole year, with an estimated average reduction of 0.7 units per week on symptoms assessed at one year, for hypothetical medication in the last 13 weeks of the year, compared to no medication in this period. The present findings support a strong and causal direct and indirect effect of pharmacological treatment of adults with ADHD on improvement in symptoms, and with a stronger treatment effect than has been reported.

Estimation of causal treatment effects from observational studies has obvious limitations and challenges. However, the randomized controlled trial is often no alternative, due to practical or ethical reasons. If a scientific question of interest really is a causal one, the analysis should target the specific question, even if the price is strong assumptions. In most cases, a scientist would more easily relate to subject specific assumptions, than an association with few assumptions and no causal statement, which can be misleading (in both magnitude and direction). Causal methods are more explicit in the assumptions for a causal interpretation, and often more robust for certain types of bias, even though they are just as susceptible to e.g. unmeasured confounding as more traditional methods.

Short term effect of medication for treatment of ADHD is well documented [

To estimate the causal effect of medication, a successful study design would compare two arms in a trial, an active on medication and a control arm without medication. However, as for ADHD, present knowledge makes inclusion to the control-arm questionable. Also, if all eligible patients are offered treatment, a high number of patients are more likely to participate. A research design where all patients are offered medication and follow-up assessments, resembles clinical practice, with findings that apply to the population of interest. Causal effects can still be estimated under additional assumptions.

If all patients start off with medication, and those that experience intolerance, or a high number of side-effects, terminate the treatment, it means that there are patients on and off medication. A direct comparison of these groups yields a biased estimate of the causal treatment effect, due to feedback between the treatment assignment process and the outcome, e.g. symptoms. If treatment decreases subsequent symptoms on the one hand, but on the other hand increases side- effects and intolerance which again leads to termination of treatment, one might expect that the direct comparison between those on and off medication is an overestimate of the true treatment effect. Bias in the opposite direction can be expected due to time-varying confounding. Prior medication likely improves subsequent symptoms which represents a confounder for the association between continued medication and final symptoms. The positive association between prior symptoms and both continued medication and final symptoms will weaken the negative association between continued medication and final symptoms, and result in underestimation of the true treatment effect.

A marginal structural model (MSM), with inverse probability weighting (IPW) can account for such feedback, and make an unbiased estimate of the treatment effect, under specific assumptions [

In this study, a re-analysis of longitudinal follow-up data on medication history and symptoms over one year for adult ADHD patients is presented. Ordinary longitudinal analysis of the same data, has been previously published, with limited focus on causal treatment effect, and little attempt to account for time-varying confounding mentioned above [

The study sample, measures used, and the formulation of the MSM and CBMSM, including necessary assumptions for causal interpretation, will be described in the Method section. In Results, the findings from the CBMSM method is presented and contrasted to the ordinary fitted MSM. The Discussion section, relates the results to the previously published analysis, lists some strengths and limitations in the methods and interprets the results.

Patients were included at a specialized outpatient clinic in Vestfold, Norway, between May 2009 and December 2010. Referred patients were aged 18 - 60, had to fulfill DSM-5 criteria for ADHD [

All patients received methylphenidate as first-line medication and psychosocial treatment according to the national treatment guidelines (Norwegian Directorate of Health, 2005). Patients were assessed for symptoms, functioning and side-effects at scheduled follow-up visits at baseline, 6 weeks, 12 weeks, 26 weeks and 52 weeks. Standard-titration with immediate-release methylphenidate (MPH-IR) was prescribed for the first six weeks; 5 mg three times a day, and if tolerated stepwise increase until maximum 60 mg/day. Thereafter a flexible dose titration was applied to optimize efficacy (maximum 120 mg/day). Shift into extended-release methylphenidate (MPH-ER) was offered at the three- month visit if patients reported difficulties with compliance, annoying fluctuations in effect or otherwise wanted to try an easier administration form. If MPH was not tolerated or was ineffective, second-line medications were short-acting dextroamphetamine (dAMP) or atomoxetine (ATX). The dose of dAMP was escalated until a maximum 50 mg/day, and dose of ATX to a maximum of 120 mg/day [

Baseline characteristics of the sample have been thoroughly described previously [

The primary outcome measure was current ADHD symptoms on the 18-item Adult ADHD Self-Report Scale version 1.1 (ASRS), Norwegian version [

Two psychiatrists, not involved in the treatment, assessed overall psychosocial functioning (last two weeks) by the Global Assessment of Functioning (GAF) Scale [

Level of mental distress over the last week was self-rated on 90 items on a 5 point scale, and the mean of all items is referred to as Global-Severity-Index (GSI) [

Side-effects (Mean Side Effects―MSE) was quantified by a measure of tolerability Patient and ADHD medication form. The questionnaire lists symptoms frequently associated with stimulant treatment, and each item is scored by frequency (score 0 - 3) [

Dose is a time-varying covariate and expresses daily dose in mg (or dose equivalent) per day.

Medication (MED) is a time-varying dichotomous indicator (1/0) of whether on medication or not, at the time of assessment.

Standard notation in the causal inference literature, makes use of “counterfactual outcomes”. A counterfactual, or potential outcome, is an outcome for a hypothetical treatment regime. There are general (non-parametric) assumptions necessary to estimate causal effects from observational data [

These assumptions are helpful to interpret results, but not testable from data, although some indication of non-positivity or misspecification is given by the weight distribution in weighting methods like the MSM [

Exchangeability between treatment groups implies covariate balance, that for any covariate, the treatment groups have equal distributions, e.g. equality of weighted means. Conditional exchangeability implies that all confounding covariates are measured and balanced. In an observational study, without knowing whether or not conditional exchangeability is satisfied, to check covariate balance in measured covariates is informative and recommended [

A standard longitudinal analysis (linear mixed model) of the ASRS symptoms in the present study-sample, has previously been published (Figures 1-3) [

Feedback between medication and symptoms is illustrated in

The MSM is a model for a counterfactual outcome, univariate or repeated measure (longitudinal). In the present application, the univariate model is sufficient and easy to interpret. The MSM for the mean counterfactual end-of-study univariate ASRS level, denoted by Y (ADHD symptoms at one year follow-up), can be formulated as:

E [ Y ( m e d ¯ 52 ) ] = β 0 + β 1 m e d 6 + β 2 m e d 12 + β 3 m e d 26 + β 4 m e d 52 (1)

where m e d 6 , m e d 12 , m e d 26 , and m e d 52 are dichotomous (1/0) indicators of on/off medication at baseline, 6 weeks, 12 weeks, 26 weeks and 52 weeks, respectively. m e d ¯ 52 which denotes medication history at 52 weeks, was limited to one of five different regimes (

Medication at | Baseline | 6 weeks | 12 weeks | 26 weeks | 52 weeks |
---|---|---|---|---|---|

Average of 3 weeks of medication (N = 4) | Yes | No | No | No | No |

Average of 9 weeks of medication (N = 8) | Yes | Yes | No | No | No |

Average of 19 weeks of medication (N = 13) | Yes | Yes | Yes | No | No |

Average of 39 weeks of medication (N = 14) | Yes | Yes | Yes | Yes | No |

One year of medication (N = 131) | Yes | Yes | Yes | Yes | Yes |

medication history. β 0 is the marginal mean of Y with medication at baseline only, β 1 corresponds to average change in Y under medication regime (1, 1, 0, 0, 0) relative to (1, 0, 0, 0, 0), β 2 corresponds to average change in Y under medication regime (1, 1, 1, 0, 0) relative to (1, 1, 0, 0, 0), and so on.

The time-varying covariates GSI, GAF-S and GAF-F, are viewed as closest in time to the actual follow-up assessment. MSE (side-effects) and ASRS (symptom-level) contain information from the whole preceding period, but with most weight close to the follow-up assessment. MED (indicator for on/off medication) describes the treatment status at the time of assessment. MED = 0 means that medication was terminated some time during the preceding period. The parameters in Equation (1) therefore each represents the effect of half the previous and half the following period of extra medication (on average), with uniformly distributed termination times. With respect to the direction of effects, the influence is allowed from medication to symptom-level in the same period, and from symptom-level to medication in the following period (

To estimate the MSM in (1), a weighted univariate linear regression model (associational), conditional on medication history only, is fitted [

E [ Y | M E D ¯ 52 ] = β 0 + β 1 M E D 6 + β 2 M E D 12 + β 3 M E D 26 + β 4 M E D 52 (2)

with weights (stabilized) for each person at time j, specified by:

S W j = ∏ s = 1 j P r ( M E D s | M E D ¯ s − 1 , V ) P r ( M E D s | M E D ¯ s − 1 , L ¯ s − 1 , V ) (3)

The weights are estimated by a series of logistic regressions. V represents baseline confounders and L ¯ s − 1 the time-varying confounders (only in denominator). Robust standard errors are used, e.g. by the “sandwich” software package in R [

Estimating the weights in (3) for the MSM, is usually done with a series of logistic regressions by maximum likelihood (ML), in a generalized linear model (GLM). With misspecification in these PS-models, maximizing the likelihood might not balance the covariates [

With medication = yes/no in four consecutive time periods (everyone started off on medication), the number of potential treatment histories are 16. To avoid loss of precision in parameter estimates from patterns with few patients, only monotone treatment regimens were allowed (

To assess differences between treatment groups, overlap in the PS distributions for the five groups, was examined. Here, the PS represented probability for termination of medication sooner or later, as a function of baseline covariates.

To assess balance, the standardized mean difference (SMD) for each covariate (difference in weighted means between two treatment groups, divided by the population standard deviation), is calculated. More precisely, the SMD express imbalance, which is smaller for better balance. As a rule of thumb, a value less than 0.25 is commonly considered to be acceptable [

Model selection for the weights in Equation (3) was based on combinations of covariates that were significant predictors for continued medication at several time-points, and with resulting imbalance as low as possible. The following logistic model was chosen:

logit ( P r ( M E D j = 1 | ⋯ ) ) = α 0 + α 1 a n x + α 2 m s e 0 + α 3 a g e + α 4 a g e 2 + α 5 g a f s 0 + α 6 a l c + α 7 a l c × m s e 0 + α 8 a l c × g a f s 0 + α 9 d o s e j − 1 + α 10 a s r s j − 1 + α 11 g s i j − 1 + α 12 g a f f j − 1 2 (4)

with the abbreviations anx―indicator for any anxiety disorder at baseline, m s e 0 ―mean side-effects at baseline, age―age at baseline, g a f s 0 ―baseline psychosocial function, symptom part, alc―indicator for baseline alcohol use disorder, d o s e j − 1 ―dose in previous period, a s r s j − 1 ―ADHD symptoms at previous assessment, g s i j − 1 ―distress at previous assessment, g a f f j − 1 2 ―squared psychosocial function, function part at previous assessment.

A similar model for censoring in the censoring weights was constructed. Censoring was used for deviation from monotone treatment (all patients that deviated from the five different regimes in

In these data, there were missing values, both in the outcome and in covariates. The 26 week assessment had the most missing, gafs_{26}, gaff_{26}―66 missing, mse_{26}―63 missing, gsi_{26}―49 missing, asrs_{26}―46 missing, and 27 variables had no missing values. 132 observations had complete cases with no missing values at any assessment. With substantial missingness for some covariates (more than 20%), possibly caused by side-effects, multiple imputation (MI) was considered a suitable method to reduce the impact of loss of data. The missing covariates were multiple imputed, with chained equations [

The five different groups (

lowing baseline and 6 weeks had clearly right―shifted probability mass (higher probability for termination) compared to the other groups, which is intuitively reasonable, being closest in time to the explanatory covariates. No signs of serious violations of conditional exchangeability (among measured baseline covariates) or positivity were found (

Figures 6-8 shows graphically the SMDs for each covariate in the model from Equation (4), in estimation of the parameters in the MSM, and

CBMSM | GLM | |||
---|---|---|---|---|

mean (|SMD|) | sd (|SMD|) | mean (|SMD|) | sd (|SMD|) | |

β 1 β 2 β 3 β 4 | 0.87 0.95 0.29 0.23 | 0.65 0.89 0.35 0.2 | 1.82 2.3 0.81 0.51 | 1.43 2.25 0.88 0.38 |

symptom-level, psychosocial functioning and distress were within the limit (Figures 6-8). Slightly less imbalance in estimation of β 4 , compared to β 3 is also indicated in the figures. This is confirmed in

In

Compared to an ordinary fitted MSM with GLM weights (

Separate models (CBMSM) to assess influence of different periods of medication on ASRS symptoms at 26 weeks, and at 12 weeks were fitted (MI-estimates) to examine the course of treatment effects and symptoms (_{26} as outcome, the coefficient for the most recent period of medication represents the effect of hypothetical medication relative to no medication in the 19 - 26 weeks period (when being on medication prior to 19 weeks). This seven- week period of medication had a strong and significant effect, and would be expected to reduce symptoms with 20 units ( β 3 = − 19.98 , 95 % C I : − 23.6 , − 16.37 , p < 0.001 ), a magnitude of the same size as the total treatment effect from two periods in the ASRS_{52}-model. Estimated effects of earlier periods were nonsignificant, in spite of β 2 representing the longest period of medication (10 weeks), possibly resulting from mediation through the last period. With ASRS_{12} as outcome, the coefficient for the most recent period of medication represents the effect of hypothetical medication relative to no medication in the 9 - 12 weeks period (when being on medication prior to 9 weeks). This three-week period of medication had a strong and significant effect of 13 units expected reduction in symptoms ( β 2 = − 13.1 , 95 % C I : − 18.54 , − 7.65 , p < 0.001 ). Estimated effect of the earlier period, 3 - 9 weeks, was nonsignificant, in spite of longer duration, possibly mediated through the last period.

In summary, medication seemed to have strong and positive effects on subsequent symptoms, across the whole year, both for early and late periods. Symp-

CBMSM | GLM | LM | CBMSMcens^{- } | Unweighted | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

beta | 95% CI | p-value | beta | 95% CI | p-value | beta | 95% CI | p-value | beta | 95% CI | p-value | beta | 95% CI | p-value | |

I n t | 44.67 | 38.17, 51.16 | <0.001 | 44.67 | 38.64, 50.71 | <0.001 | 32.17 | 24.67, 39.67 | <0.001 | 45.4 | 38.19, 52.61 | <0.001 | 45.5 | 37.06, 53.94 | <0.001 |

m e d 6 | −3.22 | −16.13, 9.68 | 0.62 | 1.15 | −13.71, 16.01 | 0.88 | −0.63 | −10.56, 9.3 | 0.9 | −4.03 | −15.92, 7.85 | 0.51 | −3.38 | −16.21, 9.46 | 0.61 |

m e d 12 | 7.46 | −4.77, 19.7 | 0.23 | 1.85 | −12.64, 16.34 | 0.8 | 0.43 | −8.58, 9.44 | 0.92 | 5.18 | −5.44, 15.8 | 0.34 | 2.72 | −8.07, 13.52 | 0.62 |

m e d 26 | −12.06 | −20.3, −3.88 | 0.004 | −12.24 | −19.69, −4.79 | 0.001 | −7.56 | −14.09, −1.03 | 0.02 | −10.25 | −18.65, −1.84 | 0.02 | −10.77 | −18.26, −3.29 | 0.005 |

m e d 52 | −8.72 | −15.6, −1.88 | 0.013 | −6.51 | −12.31, −0.71 | 0.03 | −4.08 | −9.35, 1.2 | 0.13 | −8.10 | −15.31, −0.9 | 0.03 | −4.65 | −10.7, 1.4 | 0.13 |

Imputation | CBMSM | ||||
---|---|---|---|---|---|

β 0 | β 1 | β 2 | β 3 | β 4 | |

1 2 3 4 5 6 7 8 9 10 | 45.08 44.05 45.10 44.76 45.07 44.29 43.6 44.47 44.96 45.31 | −3.4 −2.53 −3.36 −3.53 −4.0 −3.83 −2.7 −3.04 −2.88 −2.96 | 7.69 7.20 7.19 7.73 7.21 8.51 8.21 7.29 6.98 6.64 | −12.17 −12.09 −11.84 −12.0 −11.44 −12.21 −12.83 −11.98 −12.11 11.96 | −9.28 −8.54 −9.05 −8.73 −8.61 −8.73 −8.11 −8.61 −8.64 −8.89 |

Column means are MI-estimates in first column,

ASRS_{52} | ASRS_{26 } | ASRS_{12 } | |||||||
---|---|---|---|---|---|---|---|---|---|

beta | 95% CI | p-value | beta | 95% CI | p-value | beta | 95% CI | p-value | |

I n t | 44.67 | 38.17, 51.16 | <0.001 | 49.29 | 42.88, 55.69 | <0.001 | 34.19 | 13.76, 54.63 | 0.001 |

m e d 6 | −3.22 | −16.13, 9.68 | 0.62 | −7.12 | −16.84, 2.59 | 0.15 | 10.64 | −10.4, 31.71 | 0.32 |

m e d 12 | 7.46 | −4.77, 19.7 | 0.23 | 5.67 | −2.23, 13.58 | 0.16 | −13.1 | −18.54, −7.65 | <0.001 |

m e d 26 | −12.06 | −20.3, −3.88 | 0.004 | −19.98 | −23.6, −16.37 | <0.001 | |||

m e d 52 | −8.72 | −15.6, −1.88 | 0.013 |

toms, measured at early stages by ASRS_{12} and ASRS_{26} would be expected to be largely reduced by medication immediately prior to symptom assessment, with no direct effects from earlier medication. Symptoms at one year follow-up, ASRS_{52}, would be expected to be influenced by more medication history. Hypothetical medication in the mid-period, 19 - 39 weeks (with no medication in the last period) seemed to be most influential, with a direct effect in addition to an indirect effect through medication in the 39 - 52 week period. This is in line with a dose-response effect (longer duration of treatment corresponds to larger effect), but in contrast to symptoms at early stages, with nonsignificant effects from earlier periods of treatment with longer duration. Treatment effect per week seemed strongest in the early stages, but persisted across the whole year.

In the present application of the CBMSM method, strong causal effects of medication on ADHD symptoms were found, under standard assumptions for causal inference. A MSM (GLM weights) was fitted, to account for time-varying confounding and selection bias, but with covariate balance that was greatly improved with the CBMSM. The magnitude of the treatment effects from the CBMSM, are the ones believed to be closest to the unknown true levels. The alternative estimates were smaller (25% relative difference compared to the standard MSM), which indicates that for these data, standard analysis yields underestimated treatment effects.

With a maximum of one year medication in four consecutive time periods, all periods seemed to represent improvement in ADHD symptoms, over the course of the study. In terms of treatment effect per week, early stages seemed to have the strongest influence, with an average reduction of approximately 4 units per week of symptoms at 12 weeks (ASRS_{12}) ( β 2 = − 13.1 , 95 % C I : − 18.54 , − 7.65 , p < 0.001 ) for hypothetical medication in the 9 - 12 week period. However, the treatment effect persisted over the whole year, with an expected reduction of 0.7 units per week of symptoms at the final assessment (ASRS_{52}) with hypothetical continued medication over the last 13 weeks.

The CBMSM model revealed causal information, not accessible in a standard longitudinal analysis. The average symptom level during the course of the observation period is characterized by a rapid drop in the first 6 weeks, followed by an enduring constant low level for the rest of the year (

effect across the whole period, another notable change in the treatment effect was revealed. The different models for symptoms at 12 weeks (ASRS_{12}) and 26 weeks (ASRS_{26}), both showed significant causal effects of hypothetical medication in the most recent treatment period, with no significant effects from earlier periods, in spite of their length. As for symptoms at end of follow-up (ASRS_{52}), significant causal effects were found for hypothetical medication in the two most recent periods. The causal direct effect from hypothetical medication in the 19 - 39 week period on symptoms at one year, may represent other pathways than through continued treatment in the 39 - 52 week period (pharmacological pathway). Clinical insight has suggested that a patient’s social environment needs time to adapt and trust the improvement, when the patient is treated. This adaption process is slower than the pharmacological effect, but can be important for further improvement, through motivation, support, and positive feedback. A four and a half months adaption process is one possible explanation for the direct effect from the 19 - 39 week period. In this case, treatment effect in the early periods would be mostly pharmacological. Alternatively, the direct effect from the 19 - 39 week period corresponds to a dose-response effect. In that case, other explanations for the lack of such dose-response effects on symptoms at 12 and 26 weeks are needed.

If the models for ASRS_{12}, ASRS_{26}, and ASRS_{52} had been similar with respect to causal effects of medication, they could have been combined in a repeated measures MSM (longitudinal model), for possible gain in efficiency. In the present application, however, the differences in the separate univariate models were informative.

In conclusion, the CBMSM improved covariate balance substantially in these data, compared to the standard fitted MSM, and should therefore represent estimates closer to the causal effect of medication one would find in a successful RCT.

The improved covariate balance, strengthened the treatment effect, compared to the MSM, and even more so compared to ordinary longitudinal analysis with naive adjustment for time-varying confounding, reported in the literature [

Klungsøyr, O. and Fredriksen, M. (2017) Pharmacological Treatment of Adult Attention-Deficit/ Hyperactivity Disorder (ADHD) in a Longitudinal Observational Study: Estimated Treatment Effect Strengthened by Improved Covariate Balance. Open Journal of Statistics, 7, 988-1012. https://doi.org/10.4236/ojs.2017.76070

First, the cross-sectional situation is considered; a sample of size from a population, individuals indexed by i = 1 , ⋯ , n , one single dichotomous treatment indicator T i = t , t ∈ { 0 , 1 } ( T i = 1 for treatment and 0 for no treatment), a K-dimensional vector of confounders X i with influence on treatment assignment and outcome, Y i ( t ) is the counterfactual univariate outcome for hypothetical treatment assignment T i = t and observed outcome Y i . The propensity score is (PS) assumed to satisfy 0 < P r ( T i = 1 | X i ) = π ( 1 , X i ) < 1 , for all X i [

{ Y i ( 1 ) , Y i ( 0 ) } ∐ T i | π ( 1 , X i ) (A1)

which means that unbiased estimation of the treatment effect is possible by conditioning on the PS alone. It also implies covariate balance between treatment groups, for example equal weighted mean in the treated and untreated. (A1) represents a dimension reduction and has led to development of propensity methods, like weighting. Inverse probability of treatment weighting [

w i ( t ) = π ( t , X i ) − 1 , t ∈ { 0 , 1 } (A2)

In observational studies, the PS has to be estimated, commonly by e.g. logistic regression and maximum likelihood, for example parameterized by

π α ( T i , X i ) = P r ( T i | X i ) = expit ( ( 2 T i − 1 ) α T X i ) (A3)

where expit ( z ) = [ 1 + exp ( − z ) ] − 1 . If the model in (A3) is misspecified, the covariates are possibly unbalanced. To make estimation more robust for misspecification, I&R proposed to estimate the PS under an additional condition of covariate balance, formulated as

E { T i X i π α ( 1 , X i ) − ( 1 − T i ) X i π α ( 0 , X i ) } = 0 (A4)

By iterated expectation it is easily seen that both terms in (A4) equals E ( X i ) , the population mean, and also a weighted conditional mean of the treated/untreated, respectively. With (A3) as the parametric model for the PS (1-PS), the number of equations in (A4) (one for each of the K covariates), equals the number of unknown parameters, which is the just-identified case [

In the present application, the data is longitudinal with four time periods, J = 4 . The time varying covariate at a given time period j depends possibly on the past treatment history until the previous time period ( j − 1 ), written t ¯ j − 1 . It needs to be balanced on all current and future treatment trajectories, written t _ j = { t j , t j + 1 , ⋯ , t J } , and is conditional on the past history. The covariate balancing conditions are written

E { X i j ( t ¯ j − 1 ) } = E { 1 ( T ¯ i , j − 1 = t ¯ j − 1 , T _ i j = t _ j ) w i ( t ¯ J , X i j ( t ¯ J − 1 ) ) X i j ( t ¯ j − 1 ) } (A5)

and can be represented in an orthogonal way, in the following manner: Let the time varying covariates be combined in a 4 K × 1 dimensional column vector X ˜ i = ( w i X i 1 T , w i X i 2 T , w i X i 3 T , w i X i 4 T ) T . To determine the sign of each term in the moment conditions, the following 2 J − 1 dimensional column vector M i is needed:

M i T = ( − 1 T i 1 , − 1 T i 2 , − 1 T i 1 + T i 2 , − 1 T i 3 , − 1 T i 1 + T i 3 , − 1 T i 2 + T i 3 , − 1 T i 1 + T i 3 + T i 3 , − 1 T i 4 , − 1 T i 1 + T i 4 , − 1 T i 2 + T i 4 , − 1 T i 1 + T i 2 + T i 4 , − 1 T i 3 + T i 4 , − 1 T i 1 + T i 3 + T i 4 , − 1 T i 2 + T i 3 + T i 4 , − 1 T i 1 + T i 2 + T i 3 + T i 4 )

As described above, the moment conditions balance covariates measured at time j across all possible current and future treatments, but not past treatments and their interactions. Therefore, moment conditions on past treatments and their interactions are not binding, and can be zero’d out. As time progresses, the number of not binding moment conditions increase. The four different “selection matrix-es” to identify which conditions to zero out, are given by:

R 1 = I 15 , R 2 = [ 0 0 0 I 14 ] , R 3 = [ 0 3 × 3 0 3 × 12 0 12 × 3 I 12 ] , R 4 = [ 0 7 × 7 0 7 × 8 0 8 × 7 I 8 ]

where I d is the identity matrix of dimension d × d .

The sample moment conditions for time-period j can then be written [

G j = 1 n ∑ i [ M i T R j ⊗ X ˜ i ] (A6)

where ⊗ is the Kronecker product (matrix on right-hand side is multiplied by each element in matrix on left-hand side), and with

G 1 = 1 n ∑ i ( − 1 T i 1 ⋯ − 1 T i 1 + T i 2 + T i 3 + T i 4 ) I 15 ⊗ w i X i 1 = 1 n ∑ i ( − 1 T i 1 w i X i 1 ⋯ − 1 T i 1 + T i 2 + T i 3 + T i 4 w i X i 1 )

G 2 = 1 n ∑ i ( − 1 T i 1 ⋯ − 1 T i 1 + T i 2 + T i 3 + T i 4 ) [ 0 0 0 I 14 ] ⊗ w i X i 2 = 1 n ∑ i ( 0 − 1 T i 2 w i X i 2 ⋯ − 1 T i 1 + T i 2 + T i 3 + T i 4 w i X i 2 )

G 3 = ⋯ = 1 n ∑ i ( 0 0 0 − 1 T i 3 w i X i 3 ⋯ − 1 T i 1 + T i 2 + T i 3 + T i 4 w i X i 3 )

G 4 = ⋯ = 1 n ∑ i ( 0 0 0 0 0 0 0 − 1 T i 4 w i X i 4 ⋯ − 1 T i 1 + T i 2 + T i 3 + T i 4 w i X i 4 )

and combining for all time-periods yields the matrix G with dimension 4 K × 15

G = [ G 1 G 2 G 3 G 4 ] (A7)

with corresponding covariance matrix (dimension 60 K × 60 K )

W = 1 n ∑ i E ( M i M i T ⊗ X ˜ i X ˜ i T | X i ) (A8)

where

X ˜ i X ˜ i T = [ w i 2 X i 1 X i 1 T w i 2 X i 1 X i 2 T w i 2 X i 1 X i 3 T w i 2 X i 1 X i 4 T w i 2 X i 2 X i 1 T w i 2 X i 2 X i 2 T w i 2 X i 2 X i 3 T w i 2 X i 2 X i 4 T w i 2 X i 3 X i 1 T w i 2 X i 3 X i 2 T w i 2 X i 3 X i 3 T w i 2 X i 3 X i 4 T w i 2 X i 4 X i 1 T w i 2 X i 4 X i 2 T w i 2 X i 4 X i 3 T w i 2 X i 4 X i 4 T ]

Since each moment condition set equal to zero, like in (A4) (for time period 1 there are 15 conditions), has the same number of equations as the number of unknown parameters, the set of equations is over-identified, and there is no unique solution. Instead a quadratic function of the moment conditions is minimized to come as close as possible to zero, to “minimize imbalance” and this is achieved by the GMM estimator [

α ^ = arg min α v e c ( G ) T W − 1 v e c ( G ) (A9)

where G is from (A7) and the covariance from (A8), where the expectation can be calculated analytically in the logistic regression case [