_{1}

The main purpose in many randomized trials is to make an inference about the average causal effect of a treatment. Therefore, on a binary outcome, the null hypothesis for the hypothesis test should be that the causal risks are equal in the two groups. This null hypothesis is referred to as the weak causal null hypothesis. Nevertheless, at present, hypothesis tests applied in actual randomized trials are not for this null hypothesis; Fisher’s exact test is a test for the sharp causal null hypothesis that the causal effect of treatment is the same for all subjects. In general, the rejection of the sharp causal null hypothesis does not mean that the weak causal null hypothesis is rejected. Recently, Chiba developed new exact tests for the weak causal null hypothesis: a conditional exact test, which requires that a marginal total is fixed, and an unconditional exact test, which does not require that a marginal total is fixed and depends rather on the ratio of random assignment. To apply these exact tests in actual randomized trials, it is inevitable that the sample size calculation must be performed during the study design. In this paper, we present a sample size calculation procedure for these exact tests. Given the sample size, the procedure can derive the exact test power, because it examines all the patterns that can be obtained as observed data under the alternative hypothesis without large sample theories and any assumptions.

In superiority randomized trials in which subjects are assigned to one of two treatment groups and the outcome is binary, data can be summarized in a two-by-two contingency table. Investigators are often interested in testing the equality of the causal risks of the two groups, using a hypothesis test. A popular method for the hypothesis test is Fisher’s exact test [

Recently, two exact tests for the weak causal null hypothesis were developed [

To conduct statistical hypothesis testing in an actual randomized trial, the sample size need in the trial must be calculated during the study design. Moher et al. [

The paper is organized as follows. In Section 2, we describe the notation used throughout this paper. In Section 3, we review the unconditional and conditional exact tests. In Section 4, we present a procedure of the sample size calculation for these exact tests. The procedure is examined through a numerical example in Section 5. Finally, we discuss the results in Section 6 and state the conclusion in Section 7.

Throughout this paper, we denote X as the assigned treatment; X = 1 if a subject was assigned to the treatment group, and X = 0 if assigned to the control group. Y denotes the binary outcome; Y = 1 if the event occurred, and Y = 0 if it did not. The results are summarized in

For each subject, it is also possible to consider the potential outcomes [_{i}(x) denotes the potential outcome for ith subject i (

i) Type 11: individuals who would experience the event regardless of the assigned treatment group; i.e., (Y_{i}(1), Y_{i}(0)) = (1, 1).

ii) Type 10: individuals who would experience the event if assigned to the treatment group but would not experience the event if assigned to the control group; i.e., (Y_{i}(1), Y_{i}(0)) = (1, 0).

iii) Type 01: individuals who would not experience the event if assigned to the treatment group but would experience the event if assigned to the control group; i.e., (Y_{i}(1), Y_{i}(0)) = (0, 1).

iv) Type 00: Individuals who would not experience the event regardless of the assigned treatment group; i.e., (Y_{i}(1), Y_{i}(0)) = (0, 0).

Note that all subjects belong to one of these four types.

Let n_{st} denote the number of subjects with (Y_{i}(1), Y_{i}(0)) = (s, t), where s, t = 0, 1. The causal risk if all subjects were assigned to the treatment group (X = 1) can be expressed as

This is because only subjects with type 11 or type 10 would experience the event. Likewise, the causal risk if all subjects were assigned to the control group (X = 0) can be expressed as

because only subjects with type 11 or type 01 would experience the event. Therefore, the sample average treatment effect (the difference between (1) and (2)) can be expressed

Group | Event | ||
---|---|---|---|

Occurred (Y = 1) | Not occurred (Y = 0) | Total | |

Treatment (X = 1) | a | b | a + b |

Control (X = 0) | c | d | c + d |

Total | a + c | b + d | n |

as

and thus the null hypothesis can be expressed as

H_{0}:

which corresponds to the weak causal null hypothesis of

H_{0}:

This null hypothesis will be the main interest in many clinical trials.

Here, we consider the null hypothesis of

H_{0}:

which is a special case for the weak causal null hypothesis (4). The null hypothesis (6) implies that the combination of (Y_{i}(1), Y_{i}(0)) is limited to (Y_{i}(1), Y_{i}(0)) = (1, 1) or (0, 0), and thus subjects with (Y_{i}(1), Y_{i}(0)) = (1, 0) or (0, 1) do not exist. Therefore, this null hypothesis corresponds to the following sharp causal null hypothesis:

H_{0}:

which is also a special case for the weak causal null hypothesis (5). It is obvious that the weak causal null hypothesis holds whenever the sharp causal null hypothesis holds. However, in general, the rejection of the sharp causal null hypothesis does not mean that the weak causal null hypothesis is rejected (i.e.,

When the random assignment is conducted by the ratio of 1:r, we assume that subjects are assigned as in _{st} subjects, n_{st}_{,1} subjects are assigned to the treatment group (X = 1) with the probability of_{st}_{,0} subjects are assigned to the control group (X = 0) with the probability of_{st}_{,1} of n_{st} subjects are assigned to the treatment group can be expressed as follows:

where

Set of conditions 1:

Group | Event | ||
---|---|---|---|

Occurred (Y = 1) | Not occurred (Y = 0) | Total | |

Treatment (X = 1) | n_{11,1} + n_{10,1} | n_{00,1} + n_{01,1} | n_{11,1} + n_{10,1} + n_{01,1} + n_{00,1} |

Control (X = 0) | n_{11,0} + n_{01,0} | n_{00,0} + n_{10,0} | n_{11,0} + n_{10,0} + n_{01,0} + n_{00,0} |

Total | n_{11} + n_{10,1} + n_{01,0} | n_{00} + n_{01,1} + n_{10,0} | n |

The first condition is the null hypothesis (4), and the second is the total number of subjects. The last two conditions are derived on the basis of

The risk difference estimated from the observed data, RD_{O}, is

from _{N}, is

from _{O} ≤ 0, but the following methods can easily be applied to the case of RD_{O} ≥ 0. For RD_{O} ≤ 0, the unconditional exact test yields the one-sided p-value, p, using the following formula:

with

where I(z) = 1 if z ≤ 0 and I(z) = 0 if z > 0 with z = RD_{N} − RD_{O} (the difference between (11) and (10)). This is the unconditional exact test introduced by Chiba [

For the conditional exact test, the numbers of subjects assigned to the two groups are fixed. Thus, instead of the probability (8), the following probability is used:

where the following conditions are required:

Set of conditions 2: Set of conditions 1 (9) plus

Consequently, the conditional exact test yields the one-sided p-value, p, using the following formula:

with

We note that, under the following monotonicity assumption [

Assumption 1 (monotonicity):

the weak causal null hypothesis (4) is equivalent to the sharp causal null hypothesis (6). This is because there is no subject with type 10, i.e., n_{10} = 0, under this assumption. We further note that the conditional exact test degenerates to Fisher’s exact test under the monotonicity assumption (18) [

In this paper, we define a two-sided p-value as twice the one-sided p-value.

Hypothesis tests of non-inferiority focus on the null hypothesis of H_{0}: _{0}:_{10} − n_{01} = δn. To take the case in which δn is not an integer value into account, we set the null hypothesis to a maximum integer value satisfying n_{10} − n_{01} ≤ δn. Consequently, for non-inferiority trials, the one-sided p-value can be calculated by substituting n_{10} = n_{01} in the set of conditions 1 (9) by n_{10} − n_{01} = m, where m is a maximum integer value satisfying m ≤ δn.

We note that we can also yield the 100α (%) CI, which is a CI corresponding to a significance level of α (two-sided), by finding the range in which the null value of n_{10} − n_{01} is not rejected at a significance level of α/2 based on the two separate one-sided hypothesis tests. Chiba [

In the situation in which a randomized clinical trial with the assignment ratio 1:r is planned, we set the sample size in the treatment group to _{1} + N_{0}. Furthermore, we set the response probabilities under the alternative hypothesis as follows: P_{1} if all subjects are assigned to the treatment group, and P_{0} if all subjects are assigned to the control group.

First, we derive the power function for a given sample size N for the unconditional exact test. When the one-sided p-value is set to α/2, the power function can be derived by the following procedure:

1) Derive combinations of (n_{11}, n_{10}, n_{01}, n_{00}) under the alternative hypothesis, which satisfy n_{10} − n_{01} = M_{A}, where M_{A} is a maximum integer value satisfying

2) For each combination of (n_{11}, n_{10}, n_{01}, n_{00}) in Step 1, derive all combinations of (a, b, c, d), which can be obtained as observed data under the combination of (n_{11}, n_{10}, n_{01}, n_{00}), from a = n_{11,1} + n_{10,1}, b = n_{01,1} + n_{00,1}, c = n_{11,0} + n_{01,0}, d = n_{10,0} + n_{00,0} (see

3) For each combination of (n_{11}, n_{10}, n_{01}, n_{00}), (a, b, c, d) and the probability (8) in Step 2, using the other combination _{0}:_{N} is a maximum integer value satisfying M_{N} ≤ δN (δ = 0 for the superiority trial and δ > 0 for the non-inferiority trial).

4) Derive the conditional power given (n_{11}, n_{10}, n_{01}, n_{00}), p^{*}, by summing the probability (8) in Step 2 for cases in which the one-sided p-value in Step 3 is smaller than α/2.

5) Derive the power from inf{p^{*}: (n_{11}, n_{10}, n_{01}, n_{00})}.

In Step 5, we take the infimum. This is because we cannot know which combination of (n_{11}, n_{10}, n_{01}, n_{00}) is the most plausible from the assumed true values of P_{1} and P_{0}. Nevertheless, if investigators have plausible information about (n_{11}, n_{10}, n_{01}, n_{00}), such as the monotonicity assumption (18), the power can be calculated using the information. The hypothesis test can also be performed using the information. The required sample size can be determined by the smallest integer value of N, such that the power derived in Step 5 is larger than or equal to the power given in advance.

For the conditional exact test, the power function is obtained by adding the condition of

The procedure to calculate the power presented here examines all of the patterns that can be obtained as observed data under the alternative hypothesis by applying an exact test without large sample theories and any assumptions. Therefore, the calculated power is exact. However, the procedure requires large computer memory in addition to significant computing time, especially for the unconditional exact test. Unfortunately, it is very difficult to perform the procedure without any assumptions in actual clinical trials. In the next section, we will illustrate the procedure under the monotonicity assumption (18) using an example.

For the illustration, we have used the data from a superiority randomized clinical trial to evaluate the effects of subcutaneous drainage during digestive surgery [

For the sample size calculation, it was assumed that the true SSI incidence proportion would be 0.10 in the ND group and 0.02 in the PD group with a significance level of 0.05 (two-sided) and a power of 0.80 [

_{11}, n_{10}, n_{01}, n_{00}) with n_{10} ≠ 0, and these combinations may derive the lower power.

In this paper, we proposed a sample size calculation method for the exact tests introduced by Chiba [

Group | Superficial surgical site infections (SSI) | ||
---|---|---|---|

Yes | No | Total | |

Passive drainage (PD) | 4 | 120 | 124 |

No drainage (ND) | 12 | 110 | 122 |

to perform the presented methods in actual clinical trials without any assumptions such as the monotonicity assumption (18), due to limitations in computing power. Further work is needed to create an efficient algorithm and to develop an approximation method.

At present, for small to moderate sample sizes, randomization with any restriction is recommended rather than simple randomization to balance some background factors between two groups. This is natural if the same hypothesis testing method is applied under either randomization method. However, if a different hypothesis testing method corresponding to the randomization method is applied, the recommendation may be changed. In Section 5, the illustration showed that the power for the unconditional exact test was higher than that for the conditional exact test. This result could be predicted, because the p-value was smaller for the unconditional exact test compared with the conditional exact test. In general, the p-value will be smaller for the unconditional exact test than for the conditional exact test, because the conditional exact test is more discrete compared with the unconditional exact test by conditioning on

The unconditional exact test also has an advantage in that, for the sample size calculation, it takes into account cases in which the actual ratio of the numbers assigned to the two groups is not just 1:r, whereas the conditional exact test assumes that the ratio is just 1:r.

Whenever we conduct a statistical hypothesis test for the weak causal null hypothesis, which is the main interest in many clinical trials, we need to apply the corresponding sample size calculation method. Of the hypothesis tests, the unconditional test may have greater test power compared with the conditional test. The unconditional test and corresponding sample size calculation method should be discussed further.

The author thanks the reviewers for helpful comments. The author also thanks Dr. Masataka Taguri for introducing the paper [

Chiba, Y. (2016) Sample Size Calculation of Exact Tests for the Weak Causal Null Hypothesis in Randomized Trials with a Binary Outcome. Open Journal of Statistics, 6, 766-776. http://dx.doi.org/10.4236/ojs.2016.65063