Improving Model Specifications When Estimating Treatment Effects across Alternative Medical Interventions

Objective: The purpose of this paper is to critique the list of independent variables commonly used in observational research and test the impact of variables for prior use and treatment history on estimates of treatment effects. Methods: Using data from the California Medicaid program, this study generated a series of OLS estimates of the effect of atypical antipsychotic medications on costs and duration of therapy to illustrate the impact of alternative model specifications on treatment effects. The first sequence of estimates consisted of six model specifications, the last of which included variables reflecting the type of episode defined according to prior treatment history and compliance. The second sequences repeated the specification of the first 6 models but were carried out separately by episode type to examine the heterogeneity of treatment effect. The second sequence of models documented the impact of additional drug history variables. Results: Estimates of the impact of atypical antipsychotic use on total costs and duration on initial drug were statistically significant in the first 6 models. Estimates changed significantly when dummy variables indicating prior use of inpatient service and nursing home care were included in the model specification. Estimated effects changed substantially when prior total cost was included in cost analysis, or when prior treatment duration was included in duration analysis. Significant variation also existed in estimated effects across episode types, and it was particularly pronounced before controlling for prior cost/duration. Conclusion: It is important to add prior measures of the outcome variable to control for unobserved bias in retrospective studies. Also, the accuracy and utility of results to clinicians can be improved significantly if analyses are performed by episode type.


Introduction
Clinicians and policy makers require medical evidence with which to effectively integrate new technologies into real-world practice.This need is especially acute when new treatment alternatives are introduced into competition with older, well established treatments.In the case of new medications, these data come from two sources: the clinical trials required for FDA and other registry approvals and observational studies that establish the "essential need" for a new treatment alternative [1].Both sources of medical evidence are necessary to estimate the cost-effectiveness of a new technology at product launch.
Efforts to document the essential need for a new technology actually begin very early in product development.Product innovators assess how well older, competing therapies are meeting the therapeutic needs of patients treated in the real world.Therapeutic gaps with older treatments typically arise when patient adherence to current therapies is sub-optimal or treatment efficacy for compliant patients is limited.Other sources of essential need are high treatment costs or high indirect cost to the patient and their caregivers [2].These indirect costs may include the costs of side effects, caregiver time, reductions in the quality of life and the like.Essential need data are used in a series of "go/no go" decisions that are made as the product is developed and tested.
If the evolving data on essential need are promising and/or the new product is efficacious, the new technology will move through the required registry trials testing safety and efficacy.These studies use experimental research designs [RCTs] which maximize the internal validity through random assignment and other techniques [e.g., blinding] [1].However, the generalizability [external validity] of results from randomized clinical trials is limited: 1. RCTs are limited to a small, homogeneous study population due to cost and patient safety concerns.Data on treatment outcomes for high risk patients may be missing or, conversely, it may be ethical to only include very high risk patients who have no remaining treatment options, as in cancer trials.
2. Patient outcomes are measured over a limited time, again due to cost and to patient burden and risk of dropout.This mis-match between study duration and time to potential treatment effect is most acute for drug therapies intended to manage chronic disease such as hypertension or hyperlipidemia.
3. By design, RCTs cannot measure patient adherence to treatment under real-world conditions.RCTs employ significant effort and resources to insure patient adherence to the study protocol.
4. Finally, FDA-registration RCTs may require only placebo-controlled trials or the list of active comparators may be constrained due to cost concerns.
Conversely, essential need studies based on retrospective data can provide CE evidence for the full range of treatment alternatives, and reflect real-world clinical practice and real-world adherence.The patients included in an essential need study also include risk groups not studied in the RCT environment.Finally, retrospective observational studies can provide evidence on long term outcomes and the [rare] clinical risk associated with existing therapies.
Drug companies combine data from real-world essential need studies and registry RCT into an initial computer-based CE model to support product marketing at launch.These models project the impact of the new technology in clinical use.However, the accuracy of the initial CE models is limited by the gaps in research on real-world adherence for the new drug, long term patient outcomes using the new drug and outcomes achieved by patient sub-groups not included in the product's clinical trials [poor external validity].While retrospective essential need studies fill in some of these gaps, the statistical validity of observational studies can be questionable if not executed well.Of equal concern, physicians, P & T committees and government program administrators may not fully understand the complexity and pitfalls of the statistical methods use in observational research.
The purpose of this paper is to critique the statistical methods commonly used in observational research by presenting a sequence of analyses which document how statistical results can change significantly as more care is taken to maximize the use of available data.Specifically, we will present a sequence of models moving from simple models to models using explanatory variables that are rarely derived from available claims data.The paper also documents the impact of alternative estimation strategies.

Statistical Challenges in Observational Research
Satisfactory internal validity can only be achieved in observational research by controlling for confounding factors associated with both treatment selection and patient outcomes.For example, it is challenging to measure the impact of a new medication relative to competing older drugs if the new medication is reserved for high risk pa-tients, or if the new medication is used initially to treat patients who failed therapy using the older alternatives [3].This bias can be reduced using multivariate statistical methods that adjust statistically for the impact of observable factors on treatment outcomes.However, treatment selection bias will continue to exist if important factors are missing from the multivariate statistical models.In the econometrics literature, this is referred to as missing variable bias.In comparative effectiveness research, missing variable bias is referred to as unobserved treatment selection bias [UTSB].
UTSB is often a function of the data available for analysis.For example, data from a health insurance program or government program [e.g., Medicare] includes the paid claim for common laboratory tests but provides no information concerning the laboratory result itself.Fortunately, the growing availability of electronic medical records [EMR] data will provide increasing opportunities for reducing the impact of UTSB in observational studies in medicine.
The first line of defense against UTSB is to use the available data to document all factors that may impact both treatment selection and patient outcomes.Researchers often ignore episodes of drug therapy initiated following the first observed treatment episode which is concerning since patient outcomes can be radically different for the second or third treatment attempt using the same drug, or for episodes of switching therapies, episodes of augmentation therapy or episodes involving combination therapy.Moreover, the later episodes contain more information about the treatment history of the patients, such as prior compliance behavior, which could significantly impact patient outcomes.This expanded use of available pharmacy data may be particularly important when newly approved medications are significantly less likely to be used as first therapy in treatment naïve patient.
Alternative model specifications make better use of available data and will also be investigated.Both difference-in-difference models (DD) [4] and fixed effects models (FE) [4] assume that UTSB is invariant across time periods (e.g.pre-treatment and post-treatment).For example, genetic factors which affect disease severity or response to drug treatment are invariant across time, and they are usually unobservable to researchers.Diff-in-diff models are popular in analysis of panel data.By differencing out fixed effects or controlling for them using dummy variables representing clusters, these models eliminate the effects of the time-invariant UTSB.But the time-invariant assumption of UTSB does not necessarily hold in practice.Even though time-invariant effects can be removed using such techniques, potential bias caused by time-varying confounding factors is still left unresolved.For instance, some clinical symptoms and health behavior are not captured by automated data systems, yet they are unlikely to remain exactly the same across time periods.

Data Sources
This study conducts a series of retrospective analyses of the impact of atypical antipsychotic medications to illustrate the impact of alternative model specifications and estimation methods on treatment effects.The study uses an existing California Medicaid (Medi-Cal) data set which was derived for a string of earlier studies [5] [6] from paid claims data from the fee-for-service portion of Medi-Cal.The data cover the period of 1994-2003 during which Medi-Cal revoked its restriction on the use of typical antipsychotics to patients who had failed at least two previous treatment attempts using typical antipsychotics.This formulary restriction was lifted in October 1997, three years after the introduction of risperidone in 1994 and exactly one year after the approval of olanzapine in 1996.Quetiapine was approved by the FDA in October 1997 and was immediately available to Medi-Cal patients without restrictions.This formulary expansion resulted in an immediate increase in the diffusion of atypical antipsychotics which are now accepted as first line drug therapy for these patients [7].
Initial inclusion criteria required that patients have a paid claim with a recorded diagnosis of schizophrenia (ICD-9 code = 295.xx)or bipolar disorder (ICD-9 codes = 296.4-296.8) and with at least one prescription for an antipsychotic medication.Additional exclusion criteria were applied once all episodes of care were identified.

Definition of the Unit of Analysis
The "standard of practice" for the unit of analysis in a retrospective CE research design data mirrors the RCT design: The episode of treatment.In the case of observational studies, the data of randomization is replaced by an "index date" defined based on the patient's first prescription of one of the study drugs.Like most RCTs, the patient is typically subjected to a "wash-out" period by requiring that the patient has not filled a prescription of any study drug prior to their initial prescription.Wash-out periods vary in length and 6 months to a year are common.Most studies then limit their analysis to these "first episodes" and ignore any subsequent use of related drugs such as augmentation therapy or the switching to an alternative medication.Limiting the analysis to first episodes excludes a large majority of treatment episodes.Moreover, new medications are seldom used as the first drug of choice and are regulated to treating "treatment failures" or providing augmentation therapy.
The data set used here includes all episodes of psychotropic drug therapy initiated by patients.An episode of treatment was defined each time a patient started a drug treatment using an antipsychotic, antidepressant or mood stabilizer not used previously or restarted an earlier drug treatment after a gap that was at least 15 days.The 15-day gap was defined in collaboration with the Medi-Cal program and was to comply with earlier finding by Weiden et al. [8], who reported that the risk of hospitalization increased substantially after breaks in therapy as short as 10 days.
The follow up period was the 12 months after the month of initiation.The 12-month follow up period was specified for the measurement of treatment outcomes which mimics intent to treat methods implemented in clinical trials.Patient episodes were then screened for eligibility during the entire pre-and post-treatment period.The amount paid for all services were inflation adjusted to 2004 using service specific rates of fee inflation from the Medi-Cal program.
Many patients had more than one treatment episode, which is very common in schizophrenia and bipolar disorders as patients switched from one antipsychotic to another or start and stop therapy.While this approach violates the usual assumption of independence across units of analysis, excluding subsequent episodes initiated by the patient was judged to generate stronger bias than hypothetical independence of sampling units [6] [9] [10].Excluding these follow-on episodes severely restricts the utility of the analysis to clinicians who required data on treatment effects for a wide range of treatment histories.

Covariates and Model Sequencing
The focus of the proposed study is to examine how the use of an expanded list of unconventional independent variables impacts estimates of total costs and duration of therapy using standard ordinary least squares (OLS) regressions.Specifically, the following sequence of models will be estimated: Model 1: The basic models include only age [categories with an interval of 10], gender, county population density [urban/rural/urban-rural-mix] and Medi-Cal aid categories Model 2: The second set of models adds dichotomous variables based on non-mental health comorbidities based on ICD-9 diagnoses at baseline.Model 3: Mental health diagnoses were added to the model specification separately to test the impact of diagnostic mix data related directly to the disease state under study.
Model 4: The list of independent variables was extended to include two dichotomous variables indicating whether or not the patient used inpatient hospital services or nursing home services in the 6 months prior to the episode start date.
Model 5: Pre-treatment measures of the outcome variables [total costs, duration of therapy] were added in this model.This specification is mathematically equivalent to difference-in-difference modeling which re-defines the outcome variable by differencing the value of the outcome measure before and after treatment.
Models 6: This model is the first to used data on the drug history of the patient at the time of treatment.The initial drug history covariates are dichotomous variable for episode type.Five types of episodes were defined in this data set: 1. First Observed Episode: The "first" episode was defined based on the patient's first psychotropic drug therapy attempt.
2. Restart Episodes: A restart episode was defined if the patient was not on active psychotropic drug therapy for 15 days or longer and initiated therapy with the medication used in their most recent episode [intermittent use].
3. Switching Episodes: A switching episode was defined if a patient changed medication while still on active therapy or within 15 days of terminating a previous therapy, and discontinued use of all previous medications within 60 days.
4. Delayed Switching Episodes: A delayed switching episode was defined if a patient changed drug therapy after a break in therapy in excess of 15 days.5. Augmentation Episodes: An augmentation episode was defined when a patient added a second medication while continuing to purchase one or more of their previous medications beyond 60 days.This analysis excludes first observed treatment episodes due to the lack of data on patient treatment history.The following analyses only used restart, delayed switching, switching, and augmentation episodes.In order to facilitate comparisons to Models 1 -6, first episodes were also excluded from the sample of episodes included in these models.
Models 7 -12: The remaining drug treatment history variables are entered sequentially in Models 7 -12: count of the number of prior treatment attempts, monotherapy vs. combination therapy, days off therapy (for restart and delayed switching episodes), and prior use of related drugs [typical and atypical antipsychotics, mood stabilizers, antidepressants, depot-formulated drugs].At this point, the analyses are conducted by episode type primarily because episode type is a significant predictor of cost and duration of therapy [ Model 6].It follows that clinicians will require information on the CE of atypical vs. typical antipsychotics by episode type.

Results
Results for the first six models for the impact of using atypical antipsychotics that used all episodes are summarized in Table 1.The outcome variables used in these models are total cost over the first post-treatment year and duration of therapy on the 'initial' drug of the episode.For example, in the case of augmentation episode, the initial drug is the augmenting drug.In addition to the impacts of atypical use, we also include the estimates of the effects of episode type indicators on cost and duration in Model 6 which are also included in Table 1.
Estimates of the impact of atypical antipsychotic use on total costs and duration on initial drug are statistically significant in the first 6 models.In Models 1 -3, the estimated impact of using an atypical antipsychotic range from $1230 to $1399, and the estimates of the impacts on duration range from 90.2 to 95.9 days.Estimates changed significantly when dummy variables indicating prior inpatient service use and prior nursing home use were included in the model specification.The effect of atypical use on total cost decreased to $398 whereas the effect on duration only slightly changed to 89.1 days.Equally important, the R 2 of the model for total cost increased substantially (0.182 to 0.571).Difference-in-difference modeling is frequently used in observational research testing the effect of new treatments or policy changes on patient outcomes.When prior total cost was included in cost analysis [Model 5], the estimated effect of atypical use increased from $398 to $615 and the R 2 further increased from 0.571 to 0.710.Similarly, when prior treatment duration was included in duration analysis, the estimated effect of atypical use decreased from 89 days to 76 days and the R 2 doubled from 0.063 to 0.130.
Model 6 estimates the impact of atypical use controlling for episode type.The results from this model demonstrate the importance of drug use history when estimating the impact of atypical antipsychotics on cost and duration of therapy in two ways.First, the estimated effect of atypical use changed to $751 while the estimated effect on duration decreased to 55 days.But more importantly, episode type has very significant impacts of costs and duration.Compared with restart episodes, switching episodes, delayed switching episodes, and augmentation episodes increased total cost by $1221, $1360, and $2237, respectively.However, the impacts of the episode type on duration were not uniformly positive.Switching and delayed switching episodes lasted an additional 137 days, 74 days relative to re-start episodes.Conversely, the use of the initial drug decreased by 76 days in augmentation episodes relative to re-start episodes, possibly reflecting intended short term use of augmentation therapy.
The results from Model 6 provide an estimate of the average impact of using an atypical antipsychotic on cost and duration of therapy controlling for how atypical antipsychotic drugs are used by episode type.However, clinicians need to know how these new drugs perform by episode type, not on average.This dictates that these analyses be conducted separately by episode type.Conducting analyses by episode type also allows researchers to add other treatment history variables to the analyses which can vary by episode type.Our analyses of use and cost by episode type are displayed in Tables 2-5.The results for the average impact of atypical use derived in Model 6 using data for all episode types is also listed in these tables as a reference.Table 2 presents the results using restart episodes starting with the original set of independent variable used in Model 1. Models 5 and 6 are equivalent when estimated using only restart episodes.In models 1 -3 using restart episodes, the estimated effects of atypical antipsychotic use on total cost range from $2301 to $2616.Including prior inpatient services use and prior nursing home use decreased the estimated effect to $1077.It further decreased to $384 after controlling for prior total cost.The estimated effect remained stable across Models 7 -11 ($493 -$567).The R 2 increased significantly at the stages of Model 4 (0.185 to 0.588) and Model 5 (0.588 to 0.740).The estimated effects of atypical antipsychotic use on duration is much more stable than estimated for cost across all models using restart episodes are between 20.3 and 38.2 days.Also, the increase in the R 2 was modest from Model 1 to Model 11 (0.018 to 0.062).
Table 3 presents the results of analyses using switching episodes.In Models 1 -3 using switching episodes, the estimated effects of atypical antipsychotic use on total cost are between $1678 and $1901.Including prior inpatient services use and prior nursing home use in the model decreased the estimated effect to $1289.Including prior total cost changed the estimated effect to $1171.The estimated effects on total costs are between $1122 and $1270 in Models 7 -11.The R 2 increased by a large amount at the stages of Model 4 (0.220 to 0.571) and Model 5 (0.571 to 0.667).The estimated effects of atypical antipsychotic use on duration in Models 1 -4 are between 106.1 and 108.0 days.But the estimated effect of atypical use dropped significantly to 24.6 days after controlling for prior treatment duration.In Models 7 -11, the estimated effects are between 25.4 days and 37.2 days.The R 2 increased from 0.071 to 0.453 at the stage of Model 5.
Table 4 lists the results of analyses using delayed switching episodes.Throughout the 10 models, the estimated effects of atypical antipsychotic use on total cost range from $1120 to $1487, and the estimated effects on duration range from 84.0 to 95.9 days.The R 2 in the cost analysis increased from 0.236 to 0.586 at the stage of Model 4 and increased from 0.586 to 0.667 at the stage of Model 5.However, the R 2 in duration analysis only increased modestly from 0.031 in Model 1 to 0.077 in Model 11.
Finally, the results of analyses using augmentation episodes are included in Table 5.In Models 1 -3, the estimated effects of atypical antipsychotic use on total cost are between −$4936 and −$4390.The estimated effect changed to −$1741 after controlling for prior inpatient services use and prior nursing home use, and further changed to −$289 after controlling for prior total costs.In Models 7 -11, the estimated effects on total costs are between −$156 and $74 and are all statistically insignificant.These are the only set of insignificant estimates in all analyses in the current study.The estimated effects on duration are between 142.6 days and 170.0 days for all 10 models.The R 2 in the cost analysis increased from 0.165 to 0.520 at the stage of Model 4 and increased from 0.520 to 0.677 at the stage of Model 5. Likewise, the R 2 in duration analysis increased from 0.096 to 0.171 at the stage of Model 5.

Discussion and Conclusion
The purpose of this study was to investigate the changes in estimated treatment effects in response to a series of explanatory variables, some of which are rarely derived from claims databases.The results from this series of estimates are illustrated in Figure 1 [Total Cost] and Figure 2 [Duration of Therapy].Two statistical effects are evident.First, controlling for prior total cost/treatment duration led to significant changes of estimates in most analyses, and the results for the impact of using atypical antipsychotics [treatment effect] tended to settle down across model specifications after that stage.This result validates the value of adding prior measures of the outcome variable which corresponds to the popular difference-in-difference estimation technique.Second, it is evident that great variation exists in estimated effects of atypical antipsychotic use across episode types which persists across model specification and is particularly pronounced before adding prior total cost/treatment duration to the model specification.But as an added bonus, conducting the analysis of treatment effects by episode type significantly increases the utility of study results to clinicians who are looking for guidance as to what works best for patients with different treatment history.
Episode type can significantly impact the estimated treatment effects because episode type has a major impact on treatment outcomes.Accordingly, comparative effectiveness research should take into account the differential treatment effects in episode-type subgroups.
A major limitation of observational result to measure treatment effects stems from the nature of claims databases.Claims databases do not usually capture important information such as disease severity and clinical symptoms.Although we controlled a long list of variables and used various model specifications in the regressions, potential bias due to unmeasured covariates could not be ruled out thoroughly.However, the future of

Figure 1 .
Figure 1.Impact of using atypical antipsychotics on total cost in first post-treatment year.

Figure 2 .
Figure 2. Impact of using atypical antipsychotics on duration of therapy.observational research in comparative effectiveness research is bright as data from electronic medical record [EMR] systems become more available.The internal validity of estimated differences between alternative treatments will only improve as better clinically relevant data are available.

Table 1 .
Impact of atypical antipsychotic use on total cost and duration of therapy: all episode types (N = 731,236).
OLS results are presented as estimate (SE).Abbreviations: N, number of episodes; OLS, ordinary least squares; SE, standard error.

Table 2 .
Impact of atypical antipsychotic use on total cost and duration of therapy: restart episodes (N = 445,258).
OLS results are presented as estimate (SE).Abbreviations: N, number of episodes; OLS, ordinary least squares; SE, standard error.

Table 3 .
Impact of atypical antipsychotic use on total cost and duration of therapy: switching episodes (N = 71,917).
OLS results are presented as estimate (SE).Abbreviations: N, number of episodes; OLS, ordinary least squares; SE, standard error.

Table 4 .
Impact of atypical antipsychotic use on total cost and duration of therapy: delayed switching episodes (N = 97,704).
OLS results are presented as estimate (SE).Abbreviations: N, number of episodes; OLS, ordinary least squares; SE, standard error.

Table 5 .
Impact of atypical antipsychotic use on total cost and duration of therapy: augmentation episodes (N = 116,357).