Confounding of Three Binary-Variable Counterfactual Model with DAG

Confounding of three binary-variable counterfactual model with directed acyclic graph (DAG) is discussed in this paper. According to the effect between the control variable and the covariate variable, we investigate three causal counterfactual models: the control variable is independent of the covariate variable, the control variable has the effect on the covariate variable and the covariate variable affects the control variable. Using the ancillary information based on conditional independence hypotheses and ignorability, the sufficient conditions to determine whether the covariate variable is an irrelevant factor or whether there is no confounding in each counterfactual model are obtained.


Introduction
Causal inference has become an important research field in statistics, data mining, epidemiology and machine learning etc. in recent decades [1][2][3][4][5][6][7], and directed acyclic graph (DAG) is involved in describing the relationship between causal connections [4].Confounding and confounder are two basic concepts for epidemiology causal inference [1,3].Several models have been presented for causal inference, two of which are the causal diagram model and counterfactual model [6,8,9].
To assess confounding and confounder, two main approaches, "collapsibility-based" and "comparability-based", are discussed in [10], which regard confounding bias as arising from differences between stratified measures of association and the corresponding original measure or from the exposed and unexposed populations which are not comparable.The comparability-based approach determines a factor to be a confounder if adjusting for it can reduce confounding bias [3,10].Geng et al. (2002) [11] point out that the effect of exposure on the rate of a disease cannot be assessed correctly in the presence of confounding bias.They propose probability criteria for confounding and discuss confounding with multi-value covariate variables.However, their work does not clearly analyze general causal DAG even with three binaryvariables, since the simple case of their definition about covariate can only be expressed by Figure 1 (see Figure 6.2 in p. 61, [12]).
As to three binary-variable DAGs, [5,13] discussed identifiability of the causal effect of the other two kinds of counterfactual models (Figures 2 and 3) using the independence hypotheses respectively.Yet, the confounding and confounder in these two simple causal DAGs are not discussed explicitly.For Figure 2 (see Figure 6.5 in p. 64, [12]), the covariate C is an intermediate variable in the causal chain.[6,12] (p.30) discuss the intermediate variable causal chain, however more variables are involved  in fitting "Back-door" formula and "Front-door" formula.
Traditionally, a confounding variable (the precise definition of a confounder) is a variable which is a common cause of both the control variable and the response variable [14] (see Figure 1).Whether the covariate variable, which is not a common cause of both the control variable and the response variable in three binary-variable counterfactual models, is a confounder?[1,10] develop a qualitative definition of confounder: controlling a variable can reduce confounding, then the variable is called a confounder.Hence, the covariate variable, which is not a common cause of both the control variable and the response variable, but affects the response variable, may be a confounder.Recently, the confounder and confounding detection attracts more attention in gene network discovery [15], the question arises from how to investigate the causal DAG in a pure gene network and how to analyze the role of covariate from statistical data if we think that there is causation structure in gene network?It is necessary to discuss the confounder and confounding in the general causal DAG diagram.These motivations drive us to investigate the confounding and confounder in general causal DAG with the definitions in [11].
In this paper, according to the precise definition, one model as shown in Figure 1 is discussed: the covariate variable affects the control variable and the response variable at the same time, and the control variable affects the response variable.By the qualitative definition, we investigate other two models: one, as shown in Figure 2, is that the control variable has the effect on the covariate variable and the covariate variable affects the response variable; the other, as shown in Figure 3, is that the control variable is independent to the covariate variable and the covariate variable affects the response variable.Obviously, the third model is the special case of the other two models with independence of the control variable and the covariate variable.Then we use the formal definitions of a confounder and an irrelevant factor in [11] and the ancillary information based on conditional independence hypotheses [5,13] to discuss the confounding of above-mentioned counterfactual models.
The rest of the paper is organized as follows: In Section 2, we introduce the main notation and definitions, and discuss the relationship between confounder and ir-relevant factor.In Section 3, confounding and irrelevant factor of three kinds of three binary-variable counterfactual models with DAGs are discussed respectively.The conclusion is given in Section 4.

Notation and Definitions
Let E, D, C be binary variables.Let the control variable E be an exposure with the values and e e representing "exposed" and "unexposed" respectively.Let the response variable be an outcome with the values 0 and 1 denoting the presence or absence of a disease, where e is the corresponding response when  [8,9].
In order to identify the casual effect of exposure on response, confounding bias is defined as the difference between the hypothetical proportion of diseased individuals in the exposed population [16,17], that is From the definition, we find that the standardized proportion   Definition 2 [11].A covariate is an irrelevant factor if Since the estimation of the hypothetical proportion is still unchanged after being adjusted for an irrelevant factor, we do not need to adjust it to reduce confounding bias.And, the relationship between irrelevant factor and confounder is obtained in Lemma 1: Lemma 1.If a covariate is an irrelevant factor, it is not a confounder.Inversely, if is a confounder, it is not an irrelevant factor.

C C
Proof.
According to the condition that C is an irrelevant factor, we can obtain that 1 1 e e e e

P D E e P D E e P D E e P D E e B
1 1 e e e e

P D E e P D E e B P D E e P D E e
1 1 e e e e

P D E e P D E e P D E e P D E e B
This is a contradiction!Hence, is not an irrelevant factor.□ C [11] (pp. 7-8) gives an example, and illustrates two cases of irrelevant factor and confounder respectively.To illuminate conceptions of confounding and irrelevant factor and Lemma 1, we continue to discuss the relationship based on their original example and give two examples as follows.
Example 1.For example in [11].Let a factor express groups categorized by every 10 years of age, and its values 1, 2, 3 and 4 denote the original age groups 20 -29, 30 -39, 40 -49 and 50 -59 years respectively, we denote it as Suppose that there is no exposure effect, i.e. there are only individuals of type 1 (individual 'doomed') and type 4 (individual immune to disease), and that the joint distribution of disease, ex-posure and a factor is given in When the individuals are regrouped by "younger than 50", we denote it as , which means we adjust the distribution of C, a coarse subpopulation is given in Table 1. Then,

 
  That is, is not a confounder, and it is not an irrelevant factor.

C
Example 2. To continue the discussion of above example in [11], when the individuals are regrouped by "younger than 40 but older than 30", we denote it as , we can obtain a coarse subpopulation given in To sum up, is not an irrelevant factor, but is a confounder.
As announced in [11], regrouping to judge it as a confounder or an irrelevant factor relies on the "right" adjustme t of its distribution.And, more im ortant, the non-uniqueness of confounder makes the causal analysis be more complex.
If we transform the adjustment of covariate variable in Figure 1 to the intervention distribution in counterfactual models in Figures 2 and 3, the definition 1 and definition 2 would be easily employed in the discussion of confounding and irrelevant factor in the other general causal DAGs.

Confounding of Counterfactual Model
Consider terfactual models, there are three counterfactual mo of causal DAGs as follows (Figures 1-3): To discuss whether there be confounding in our considering models, we use conditional independence hypotheses as follows as the ancillary information (H): 1)

A own in
C j b j a t a t

B P D E e P D E e b a t b a t b a t b a t a t a t a t a t
Using the above formulae, we translate each condition of o parameter form:

a t b a t b a t b a t E D i e a t a t a t a t
The covariate is an irrelevant factor.of.
der to prove irrelevant factor, we only ne  We obtain, From the other condition, we obtain Furthermore, according to the next condition, we have

The Second Model
As shown in Figure 2, and ve effect on at the same time, and ffects r to la pose: where can be observed from original data.But can not be observed because they are hyp ortions, also can be treated as counterfact model by intervention [13].
Then, we obtain the following formulae, Using the above formulae, we translate each condition of (H) into parameter form: 1) ollowing conditions holds, w ere Hence, the covariate is an irrelevant fact but not a confounder, and it can not reduce confounding.
Using the above formulae, we translate each condition of (H) into parameter form, where is naturally tru .

C
or,

4.
Using the formal definition confounder, nonfounding and irrelevant fact discus he confounding of three kinds of three bi variable counterfactual m logy studies and statistics, where t general causal DAG is invo n the d scussion sufficient conditions of determ ing non-confound an nt factor in a three binary-variable causal are discussed.Our work focuses on the ge ral three bi o other variable in founder and confounding are two difased on probability criteria, our dision would be more complex as sh  [5,11,13].In addition, the non-uniqueness of irrelevant factor and confounder in theory makes it more difficult to detect them and discuss the sufficient and necessary condition.The ancillary information (H) involved in our discussion is only a part of [5,13], hence we only obtain some sufficient conditions, another cause of this design lies in the thought that we want to discuss the causation along causal path.The sufficient and necessary condition discuss own in our results.The future work will extend the three-variable counterfactual model to multi-variable counterfactual model.And, we will apply the theoretical results to the confounder detection in gene network.
for the irrelevant factor is closer to the hypothetical pro-

Theorem 2 .
If one of the following con itions holds,

Figure 3 5 . 6 .
with E C , the covariate C s naturally an irrelevant factor.To keep the same expression as other DAGs, we have i the fo of Fi llowing theorem.Theorem In the causal DAG gure 3 with E C  , the covariate is an irrelevant factor.orem 5 shows that in the causal DAG Figure 3, co ate is always an irrel factor regardless of an justment or intervention on it.eorem If one of the following conditions holds, proof is similar to c). □

Table 2 . Example of a factor C which is not an irrelevant factor, but is a confounder.
. That is, even for a fixed factor C in a specific experiment, for example, age, h