_{1}

Researchers conducting randomized clinical trials with two treatment groups sometimes wish to determine whether biomarkers are predictive and/or prognostic. They can use regression models with interaction terms to assess the role of the biomarker of interest. However, although the interaction term is undoubtedly a suitable measure for prediction, the optimal way to measure prognosis is less clear. In this article, we define causal measures that can be used for prognosis and prediction based on biomarkers.
** **
The causal measure for prognosis is defined as the average of two differences in status between biomarker-positive and -negative subjects under treatment and control conditions.
The causal measure for prediction is defined as the difference between the causal effect of the treatment for biomarker-positive and biomarker-negative subjects.
We also explain the relationship between the proposed measures and the regression parameters. The causal measure for prognosis corresponds to the terms for the biomarker in a regression model, where the values of the dummy variables representing the explanatory variables are
-1/2 or 1/2. The causal measure for prediction is simply the causal effect of the interaction term in a regression model. In addition, for a binary outcome, we express the causal measures in terms of four response types: always-responder, complier, non-complier, and never-responder. The causal measure for prognosis can be expressed as a function of always- and never-responders, and the causal measure for prediction as a function of compliers and non-compliers. This enables us to demonstrate that the proposed measures are plausible in the case of a binary outcome.
Our causal measures should be used to assess whether a biomarker is prognostic and/or predictive.

There are some cases in which the responses to a treatment differ remarkably among individuals. A cause of this remarkable difference may be a biomarker. Therefore, it is important for personalized medicine to investigate the relationship between a biomarker and clinical significance (efficacy and/or safety). For example, Baselga et al. [

As seen in the above examples, in a randomized clinical trial with two treatment groups, researchers can investigate whether a particular biomarker is prognostic and/or predictive. According to Ballman [

To understand this more clearly, consider the following regression model:

E ( Y | X = x , Z = z ) = α + β x + γ z + δ x z , (1)

where X is the assigned treatment ( X = t if the subject is assigned to the treatment group and X = c if the subject is assigned to the control group), Z is the biomarker ( Z = p if the subject is positive and Z = n if the subject is negative), and Y is the outcome. Note that we can substitute E ( Y | X = x , Z = z ) with Pr ( Y = 1 | X = x , Z = z ) in the case of a binary outcome ( Y = 1 if a response is observed and Y = 0 if no response is observed). We can also replace E ( Y | X = x , Z = z ) by log Pr ( Y = 1 | X = x , Z = z ) for the risk ratio, and log(hazard) for the time-to-event.

If the dummy variables in (1) are set to t = p = 1 and c = n = 0 , the interaction term, δ, is

δ = { E ( Y | X = t , Z = p ) − E ( Y | X = c , Z = p ) } − { E ( Y | X = t , Z = n ) − E ( Y | X = c , Z = n ) } (2)

It is guaranteed that δ is a plausible measure for prediction because, under randomization, it indicates the difference in the effects of the treatment between biomarker-positive and -negative subjects. If δ = 0 , the term for the biomarker, γ, is

γ = E ( Y | X = c , Z = p ) − E ( Y | X = c , Z = n ) = E ( Y | X = t , Z = p ) − E ( Y | X = t , Z = n ) (3)

As this is a common difference between biomarker-positive and -negative subjects under both treatment and control conditions, γ is a plausible measure for the prognosis. However, if δ ≠ 0 ,

γ = E ( Y | X = c , Z = p ) − E ( Y | X = c , Z = n ) , (4)

and γ only concern the control condition, not the treatment condition. This makes the plausibility of γ as a measure for the prognosis questionable. When δ ≠ 0 , in addition to the value of γ, we need to calculate

γ + δ = E ( Y | X = t , Z = p ) − E ( Y | X = t , Z = n ) . (5)

Consequently, it is not obvious that any of the terms can be used to measure the prognosis.

The main aim of this article is to define causal measures for assessing whether a biomarker is prognostic and/or predictive. We also explain how the proposed causal measures are related to regression parameters. In addition, we express the proposed causal measures in terms of the response type in the potential outcome framework [

We define causal measures for prognosis and prediction by adopting the potential outcome framework. This corresponds to the outcome that would occur if the subject were assigned to a specific value for the treatment. We denote Y ( x ) as the potential outcome if a subject is assigned to X = x . Our work is based on the following three assumptions: the stable unit treatment value assumption (SUTVA) [

To demonstrate that the proposed causal measures are plausible measures for prognosis and prediction in the case of a binary outcome, we define subjects as having one of the following four response types (e.g., [

・ Always-responder: Would be a responder regardless of which group s/he was assigned to; i.e., ( Y ( t ) , Y ( c ) ) = ( 1 , 1 ) .

・ Complier: Would be a responder if assigned to the treatment group but a non-responder if assigned to the control group; i.e., ( Y ( t ) , Y ( c ) ) = ( 1 , 0 ) .

・ Non-complier: Would be a non-responder if assigned to the treatment group but a responder if assigned to the control group; i.e., ( Y ( t ) , Y ( c ) ) = ( 0 , 1 ) .

・ Never-responder: Would be a non-responder regardless of which group s/he was assigned to; i.e., ( Y ( t ) , Y ( c ) ) = ( 0 , 0 ) .

All subjects belong to one of these four groups. However, we cannot know which group a subject belongs to, because one of Y ( t ) and Y ( c ) is unknown. Notably, the outcomes for always- and never-responders do not depend on whether they are assigned to the treatment group or the control group, but the outcomes for compliers and non-compliers do depend on which group they are assigned to.

We define the causal measure for prognosis as follows:

[ E { Y ( t ) | Z = p } − E { Y ( t ) | Z = n } ] + [ E { Y ( c ) | Z = p } − E { Y ( c ) | Z = n } ] 2 (6)

where E { Y ( t ) | Z = p } is the expected outcome if all biomarker-positive subjects are assigned to the treatment group, and E { Y ( t ) | Z = n } is the same in the case of all biomarker-negative subjects. Likewise, E { Y ( c ) | Z = p } and E { Y ( c ) | Z = p } indicate the expected outcomes if these subjects are assigned to the control group. Thus, E { Y ( x ) | Z = p } − E { Y ( x ) | Z = n } is the difference between the outcomes when the biomarker is positive/negative under the same treatment condition. Indeed, as (6) informs us about likely outcomes independent of the treatment received, it is a measure for the prognostic biomarker according to Ballman [

We define the causal measure for prediction as follows:

E { Y ( t ) − Y ( c ) | Z = p } − E { Y ( t ) − Y ( c ) | Z = n } . (7)

This represents the difference between the causal effect of the treatment for biomarker-positive and -negative subjects. Therefore, (7) indicates whether the effect of the treatment depends on the positive/negative for the biomarker. Hence, this corresponds to a measure for the predictive biomarker discussed in Ballman [

Under the three assumptions (SUTVA, consistency, and exchangeability), E { Y ( x ) | Z = z } can be expressed as follows:

E { Y ( x ) | Z = z } = E { Y ( x ) | X = x , Z = z } = E ( Y | X = x , Z = z ) . (8)

By substituting this into (1) and setting the dummy variables to t = p = 1 / 2 and c = n = − 1 / 2 , we obtain the following equations:

E { Y ( t ) | Z = p } = α + 1 2 β + 1 2 γ + 1 4 δ , (9)

E { Y ( t ) | Z = n } = α + 1 2 β − 1 2 γ − 1 4 δ , (10)

E { Y ( c ) | Z = p } = α − 1 2 β + 1 2 γ − 1 4 δ , (11)

E { Y ( c ) | Z = n } = α − 1 2 β − 1 2 γ + 1 4 δ . (12)

These four equations derive that (6) = γ and (7) = δ. This implies that (6) can be expressed by one regression parameter regardless of the interaction term being 0 when we use a regression model with dummy variables set to t = p = 1 / 2 and c = n = − 1 / 2 . Hence, γ can be used as a measure for prognosis. We note that, when we use ordinal dummy variables t = p = 1 and c = n = 0 , we again obtain (7) = δ, but (6) = γ + δ⁄2, which is different to γ when δ ≠ 0 . This implies that (6) can only be expressed by one regression parameter when δ = 0 .

As mentioned above, we can replace E { Y ( x ) | Z = z } by Pr { Y ( x ) = 1 | Z = z } , log Pr { Y ( x ) = 1 | Z = z } , or log(hazard). However, in this section, we only discuss the case where E { Y ( x ) | Z = z } = Pr { Y ( x ) = 1 | Z = z } .

When we have a binary outcome, E { Y ( t ) | Z = z } + E { Y ( c ) | Z = z } can be expressed as follows:

E { Y ( t ) | Z = z } + E { Y ( c ) | Z = z } = Pr { Y ( t ) = 1 | Z = z } + Pr { Y ( c ) = 1 | Z = z } = Pr { Y ( t ) = 1 , Y ( c ) = 1 | Z = z } + Pr { Y ( t ) = 1 , Y ( c ) = 0 | Z = z } + Pr { Y ( t ) = 1 , Y ( c ) = 1 | Z = z } + Pr { Y ( t ) = 0 , Y ( c ) = 1 | Z = z } = Pr { Y ( t ) = 1 , Y ( c ) = 1 | Z = z } + 1 − Pr { Y ( t ) = 0 , Y ( c ) = 0 | Z = z } , (13)

where z = p , c . Substituting this into (6) gives

( 6 ) = 0.5 × ( [ Pr { Y ( t ) = Y ( c ) = 1 | Z = p } − Pr { Y ( t ) = Y ( c ) = 1 | Z = n } ] + [ Pr { Y ( t ) = Y ( c ) = 0 | Z = n } − Pr { Y ( t ) = Y ( c ) = 0 | Z = p } ] ) . (14)

This is the average of two differences: the differences in the proportions of biomarker-positive and -negative subjects who are always-responders and never-responders. In the first difference, we subtract the proportion of biomarker-negative subjects who are always-responders. However, in the second difference, we subtract the proportion of biomarker-positive subjects who are never-responders. This is because, if the proportion of biomarker-negative subjects who are always-responders is higher than that of biomarker-negative subjects, the proportion of biomarker-negative subjects who are never-responders would be higher than that of biomarker-positive subjects. Again, always- and never-responders are subjects who would have the same outcome regardless of which group they are assigned to. Therefore, (3) is a plausible causal measure for prognosis. We note that the causal measure version of (4), E { Y ( c ) | Z = p } − E { Y ( c ) | Z = n } , cannot be expressed as a function of only always- and never-responders.

In the case of a binary outcome, E { Y ( t ) | Z = z } − E { Y ( c ) | Z = z } can be expressed in a similar way to the above calculation:

E { Y ( t ) | Z = z } − E { Y ( c ) | Z = z } = Pr { Y ( t ) = 1 | Z = z } − Pr { Y ( c ) = 1 | Z = z } = [ Pr { Y ( t ) = 1 , Y ( c ) = 1 | Z = z } + Pr { Y ( t ) = 1 , Y ( c ) = 0 | Z = z } ] − [ Pr { Y ( t ) = 1 , Y ( c ) = 1 | Z = z } + Pr { Y ( t ) = 0 , Y ( c ) = 1 | Z = z } ] = Pr { Y ( t ) = 1 , Y ( c ) = 0 | Z = z } − Pr { Y ( t ) = 0 , Y ( c ) = 1 | Z = z } , (15)

where z = p , c . Substituting this into (7) gives

( 7 ) = [ Pr { Y ( t ) = 1 , Y ( c ) = 0 | Z = p } − Pr { Y ( t ) = 0 , Y ( c ) = 1 | Z = p } ] − [ Pr { Y ( t ) = 1 , Y ( c ) = 0 | Z = n } − Pr { Y ( t ) = 0 , Y ( c ) = 1 | Z = n } ] . (16)

This is the difference between two differences; the differences in the proportions of compliers and non-compliers in biomarker-positive and -negative subjects. Again, the outcome for compliers and non-compliers depends on which group they are assigned to. Hence, (7) is a plausible causal measure for prediction.

In this article, we defined causal measures that indicate whether a biomarker is prognostic and/or predictive. Our measure for prediction is the causal measure version of the interaction term used in ordinal regression analysis. However, we do not use the term for the biomarker in an ordinal regression model, which uses dummy variables set to 0 or 1, as our measure for prognosis. The measure is not calculated by comparing the outcomes of biomarker-positive and -negative subjects in the control group. Our measure takes both the treatment and control conditions into account instead of only the control condition. Specifically, on the difference scale for a binary outcome, this measure can be expressed as a function of only always- and never-responders. The outcomes for these subjects do not depend on which treatment group they are assigned to. Hence, our causal measure is a plausible measure for prognosis.

Researchers should use the causal measures defined in this article to determine whether a biomarker is prognostic and/or predictive. When trials are analyzed using a regression model, it is convenient to represent the explanatory variables by dummy variables with values −1/2 or 1/2. When we make this choice of values, we can express the causal measures in terms of the regression parameters.

Recently, some authors [

The author thanks the reviewers for helpful comments. This work was supported partially by Grant-in-Aid for Scientific Research (No. 15K00057) from Japan Society for the Promotion of Science.

Chiba, Y. (2018) Causal Measures for Prognostic and Predictive Biomarkers. Open Journal of Statistics, 8, 241-248. https://doi.org/10.4236/ojs.2018.82014