Causal Measures for Prognostic and Predictive Biomarkers

Researchers conducting randomized clinical trials with two treatment groups sometimes wish to determine whether biomarkers are predictive and/or prognostic. They can use regression models with interaction terms to assess the role of the biomarker of interest. However, although the interaction term is undoubtedly a suitable measure for prediction, the optimal way to measure prognosis is less clear. In this article, we define causal measures that can be used for prognosis and prediction based on biomarkers. The causal measure for prognosis is defined as the average of two differences in status between biomarker-positive and -negative subjects under treatment and control conditions. The causal measure for prediction is defined as the difference between the causal effect of the treatment for biomarker-positive and biomarkernegative subjects. We also explain the relationship between the proposed measures and the regression parameters. The causal measure for prognosis corresponds to the terms for the biomarker in a regression model, where the values of the dummy variables representing the explanatory variables are −1/2 or 1/2. The causal measure for prediction is simply the causal effect of the interaction term in a regression model. In addition, for a binary outcome, we express the causal measures in terms of four response types: always-responder, complier, non-complier, and never-responder. The causal measure for prognosis can be expressed as a function of alwaysand never-responders, and the causal measure for prediction as a function of compliers and non-compliers. This enables us to demonstrate that the proposed measures are plausible in the case of a binary outcome. Our causal measures should be used to assess whether a biomarker is prognostic and/or predictive.


Introduction
There are some cases in which the responses to a treatment differ remarkably among individuals.A cause of this remarkable difference may be a biomarker.Therefore, it is important for personalized medicine to investigate the relationship between a biomarker and clinical significance (efficacy and/or safety).For example, Baselga et al. [1] demonstrated that, in women with HER2-positive metastatic breast cancer undergoing first-line therapy, patients with tumors harboring a PIK3CA mutation had worse progression-free survival (PFS) compared with those with PIK3CA wild-type tumors, regardless of treatment group.Brugger et al. [2] demonstrated that, for the use of erlotinib maintenance treatment for advanced non-small-cell lung cancer, the EGFR-mutated subgroup derived a much greater benefit compared with the wild-type subgroup in terms of hazard ratio for PFS.
As seen in the above examples, in a randomized clinical trial with two treatment groups, researchers can investigate whether a particular biomarker is prognostic and/or predictive.According to Ballman [3], a prognostic biomarker informs us about likely outcomes independently of the treatment received, such as the above PIK3CA mutation, and a predictive biomarker informs us that the effect of a treatment depends on whether the subject is positive for that biomarker, such as the above EGFR mutation.Researchers can investigate these questions using regression models including explanatory variables representing the treatment group, biomarker, and treatment × biomarker interaction term.A biomarker is said to be predictive if the interaction term is not 0. If the interaction term is 0 but the term for the biomarker is not, then the biomarker is said to be prognostic.The measure for the prognosis is not clearly defined when the interaction term is not 0.
To understand this more clearly, consider the following regression model: where X is the assigned treatment ( X t = if the subject is assigned to the treatment group and X c = if the subject is assigned to the control group), Z is the biomarker ( Z p = if the subject is positive and Z n = if the subject is negative), and Y is the outcome.Note that we can substitute ( ) for the risk ratio, and log(hazard) for the time-to-event.
If the dummy variables in (1) are set to As this is a common difference between biomarker-positive and -negative subjects under both treatment and control conditions, γ is a plausible measure for the prognosis.However, if 0 and γ only concern the control condition, not the treatment condition.This makes the plausibility of γ as a measure for the prognosis questionable.When 0 δ ≠ , in addition to the value of γ, we need to calculate Consequently, it is not obvious that any of the terms can be used to measure the prognosis.
The main aim of this article is to define causal measures for assessing whether a biomarker is prognostic and/or predictive.We also explain how the proposed causal measures are related to regression parameters.In addition, we express the proposed causal measures in terms of the response type in the potential outcome framework [4] [5] for a binary outcome.This enables us to demonstrate that the suggested prognosis and prediction measures are plausible.

Methods
We define causal measures for prognosis and prediction by adopting the potential outcome framework.This corresponds to the outcome that would occur if the subject were assigned to a specific value for the treatment.We denote ( ) Y x as the potential outcome if a subject is assigned to X x = .Our work is based on the following three assumptions: the stable unit treatment value assumption (SUTVA) [5], which states that there is only a single version of each treatment level and no interference among subjects; the consistency assumption [6] that ( ) all subjects, such that the value of Y that would have been observed if X had been set to its actual value is equal to the value of Y that was observed; and the exchangeability assumption [7], which states that ( ) All subjects belong to one of these four groups.However, we cannot know which group a subject belongs to, because one of ( ) Y t and ( ) Y c is unknown.
Notably, the outcomes for always-and never-responders do not depend on whether they are assigned to the treatment group or the control group, but the outcomes for compliers and non-compliers do depend on which group they are assigned to.

Causal Measures for Prognosis and Prediction
We define the causal measure for prognosis as follows: where is the expected outcome if all biomarker-positive subjects are assigned to the treatment group, and = is the differ- ence between the outcomes when the biomarker is positive/negative under the same treatment condition.Indeed, as ( 6) informs us about likely outcomes independent of the treatment received, it is a measure for the prognostic biomarker according to Ballman [3].It is important to note that (6) treats the differences between biomarker-positive and -negative subjects under treatment and control conditions equally.
We define the causal measure for prediction as follows: This represents the difference between the causal effect of the treatment for biomarker-positive and -negative subjects.Therefore, (7) indicates whether the effect of the treatment depends on the positive/negative for the biomarker.
Hence, this corresponds to a measure for the predictive biomarker discussed in Ballman [3].

Relation to Regression Model
Under the three assumptions (SUTVA, consistency, and exchangeability), = can be expressed as follows: By substituting this into (1) and setting the dummy variables to 1 2 t p = = and 1 2 n = = − , we obtain the following equations: These four equations derive that (6) = γ and (7) = δ.This implies that ( 6) can be expressed by one regression parameter regardless of the interaction term being 0 when we use a regression model with dummy variables set to 1 2 t p = = and 1 2 c n = = − .Hence, γ can be used as a measure for prognosis.We note that, when we use ordinal dummy variables 1 t p = = and 0 c n = = , we again obtain (7) = δ, but (6) = γ + δ⁄2, which is different to γ when 0 δ ≠ .This im- plies that (6) can only be expressed by one regression parameter when 0 δ = .

Expression by Response Type on a Binary Outcome
As mentioned above, we can replace , or log(hazard).However, in this section, we only discuss the case where When we have a binary outcome, = can be expressed as follows: ( where , z p c = .Substituting this into (6) gives This is the average of two differences: the differences in the proportions of biomarker-positive and -negative subjects who are always-responders and neverresponders.In the first difference, we subtract the proportion of biomarkernegative subjects who are always-responders.However, in the second difference, we subtract the proportion of biomarker-positive subjects who are never-responders.This is because, if the proportion of biomarker-negative subjects who are alwaysresponders is higher than that of biomarker-negative subjects, the proportion of biomarker-negative subjects who are never-responders would be higher than that of biomarker-positive subjects.Again, always-and never-responders are subjects who would have the same outcome regardless of which group they are assigned to.Therefore, (3) is a plausible causal measure for prognosis.We note that the causal measure version of (4), not be expressed as a function of only always-and never-responders.
In the case of a binary outcome, where , z p c = .Substituting this into (7) gives This is the difference between two differences; the differences in the proportions of compliers and non-compliers in biomarker-positive and -negative subjects.Again, the outcome for compliers and non-compliers depends on which group they are assigned to.Hence, ( 7) is a plausible causal measure for prediction.

Discussion
In this article, we defined causal measures that indicate whether a biomarker is prognostic and/or predictive.Our measure for prediction is the causal measure version of the interaction term used in ordinal regression analysis.However, we do not use the term for the biomarker in an ordinal regression model, which uses dummy variables set to 0 or 1, as our measure for prognosis.The measure is not calculated by comparing the outcomes of biomarker-positive and -negative subjects in the control group.Our measure takes both the treatment and control conditions into account instead of only the control condition.Specifically, on the difference scale for a binary outcome, this measure can be expressed as a function of only always-and never-responders.The outcomes for these subjects do not depend on which treatment group they are assigned to.Hence, our causal measure is a plausible measure for prognosis.Recently, some authors [9] [10] [11] [12] have discussed approaches to infer causal effects defined on the basis of the response type.An interesting area for future work is to extend the approaches to a statistical method to determine whether a biomarker is prognostic and/or predictive.Such a work will make it possible to evaluate these causal measures, such as ( 14) and ( 16), directly.
the same in the case of all biomarker-negative subjects.Likewise, outcomes if these subjects are assigned to the control group.Thus, Researchers should use the causal measures defined in this article to determine whether a biomarker is prognostic and/or predictive.When trials are analyzed using a regression model, it is convenient to represent the explanatory variables by dummy variables with values −1/2 or 1/2.When we make this choice of values, we can express the causal measures in terms of the regression parame- Would be a responder if assigned to the treatment group but a non-responder if assigned to the control group; i.e., [9]demonstrate that the proposed causal measures are plausible measures for prognosis and prediction in the case of a binary outcome, we define subjects as having one of the following four response types (e.g.,[8][9]):