Pairwise Comparisons in the Analysis of Carcinogenicity Data

Analysis of carcinogenicity data generally involves a trend test across all dose groups and a pairwise comparison of the high dose group with the control. The most commonly used test for a positive trend is the Cochran-Armitage test. This test is asymptotically normal. For the pairwise comparison of the high dose group with the control group, we propose two modifications: the first modification is to apply the test on the data from high dose and control groups after dropping the data from the low and the medium dose groups; the second modification is to adjust the test conditional on data from all dose groups. We compare the power performance of these two modifications for the pairwise comparisons.


INTRODUCTION
The standard design for a long term carcinogenicity study of a new drug development in clinical research includes three treatment groups of increasing doses of the study drug (low, medium, and high) and one untreated control.The group sizes are about 50 animals per group.The statistical analyses include a trend test for positive dose response relationship in tumor incidence rates across all dose groups and pairwise comparisons of treated groups with the control group by organ/tumor combination.
The most common test for positive trend is the Cochran-Armitage [CA] test, see e.g.Cochran [1] and Armitage [2].There are several extensions of the CA test see e.g.Tarone [3,4], Hoel and Yanagawa [5], and Tamura and Young [6] among others.Since difference in mortalities among treatment groups is a concern, there are vari-ous mortality adjusted tests suggested by different authors see e.g.Peto et al. [7], Bailer and Portier [8].Both of these mortality adjusted tests can be approached from CA test.The CA test is asymptotically normal.An exact test was proposed by Mehta et al. [9].For the pairwise comparison of a treated group e.g.high dose group with the control group, both the asymptotic CA test and the exact test can be modified in two different ways.The first way is to drop the data from the low and medium dose groups and apply the trend tests to the remaining data from the high dose group and the control group.The second way is to modify the tests for pairwise comparison of the high dose group and the control group conditional on the data from all dose groups.We shall refer to these tests as unconditional pairwise test and conditional pairwise tests, respectively.The purpose of this work is to compare the power performance of these two modifications of pairwise tests.
It may be noted that a significant trend test may not necessarily indicate one of the pairwise tests to be statistically significant (see Table 1) and also a non-significant trend test may not necessarily indicate no pairwise test to be significant (see Table 2).
These tables show that it is important to check the pairwise tests after significant or non-significant trend test.
The rest of the paper is organized as follows.In Section 2, we review the CA and exact trend tests and present the modifications for the pairwise comparisons.In  Section 3, we illustrate the application of the two modified pairwise tests on a dataset, and carry out a simulation study in Section 4 to evaluate their power performances.In Section 5, we make some concluding remarks.

THEORETICAL DEVELOPMENT
Consider a carcinogenicity study with r + 1 dose groups consisting of one control and r treated groups.Let n i be the number of animals assigned to the ith treatment group, x i be the number of tumor bearing animals observed in the ith treatment group, and d i be the dose level for the ith treatment group, with d 0 = 0 for control group.Assume that x i has a Binomial distribution as , where p i is the probability of developing tumor by an animal in the ith dose group.The value of p i is generally modeled as with , the logistic distribution.The value with d i = 0 corresponds to the control group.Here, a is a nuisance parameter and b is the parameter of interest.

Test for Positive Trend
The positive trend is tested by the hypothesis 0 : 0 H b  versus the alternative hypothesis , or equivalently by testing 0 0


The CA test for testing the null hypothesis that there is no trend, 0 versus the alternative hypothesis, is given by be the sample space which is the collection of all permutations of   , the observed total number of tumor bearing animals.Define the critical region for trend test: where obs,trend . Using the hyper geometric distribution, the probability of each realization of This probability is also known as the table probability, signifying the probability of each table in the all possible permutation of the observed number of tumor bearing animals.The exact p-value for testing H 0 (right hand tail) is then

H
Since the highest dose for a regulatory carcinogenicity study is selected mostly based on the maximum tolerated dose (MTD) criterion, the pairwise comparisons between the high dose group with the control group has special regulatory interest.In this paper we present some results related to pairwise comparisons of high dose group with the control group.The results, however, can be used for the pairwise comparison of any treatment group with control.If we were interested in testing simultaneous multiple contrasts, such as Williams type contrast, the approach described in Hothorn et al. [10] In our first approach, we delete the data from all dose groups except data from dose groups 0 and r, estimate overall proportion of tumor bearing animals as from all dose groups.Using this estimate in the denominator, our second approach is to define the test statistic as and define the test statistic as We will refer to this test as the conditional test.It should be noted that under the linearity assumption of p i with d i (the denominator of) the above test is same as the Cochran-Armitage trend test.

Asymptotic Relative Efficiency of the Conditional and the Unconditional Pairwise Tests
As mentioned, in the derivation of the above test, the variance of 0 is estimated based on the data from Group r and control only.We will refer to this test as the ˆr p p  The asymptotic relative efficiency (ARE) of pairwise2 T and pairwise1 T is showing that T pairwise2 is asymptotically more efficient than T pairwise1 .
  Hothorn and Bretz [11] proposed (asymptotic) tests for positive trend based on single and multiple contrasts under the assumption of equally spaced dose-levels.For single contrast, test is defined as The is estimated by , as defined earlier.If the group sizes are equal (i.e. if n 0 = n 1 = n 2 = n 3 ) then it can be shown that the statistics T pairwise1 and T pairwiseHB are identical. .
The exact test for our second approach is as follows.As before, let 0 0 : ( , , ), , and define the critical region


We will refer to this test as the conditional exact pairwise test.
Proceeding along the lines of Mehta et al., the power of pairwise test is calculated as follows.Let Then the power of the pairwise test conditional on x + is The above power can be evaluated for exact test using the hyper geometric distribution and appropriate critical regions under conditional and unconditional situations.
We compare the relative power of three asymptotic pairwise tests, as well as that of the two exact tests T Exact, pairwise1 and T Exact, pairwise2 .For evaluation of their power functions, we performed simulations and calculated the percentage of times the null hypothesis was rejected when the alternative hypothesis was true.The SAS proc Stratify and SAS proc Multtest [12] are very convenient for the calculation of these exact probabilities.

EXAMPLE
Consider a carcinogenicity study with four treatment groups namely, control, low, medium, and high dose groups each with 50 animals, and dose scores 0, 1, 2, and 3, respectively.Suppose we observe a total of 10 animals developed a certain tumor type with 0, 2, 3 and 5 tumor bearing animals in control, low, medium, and high dose groups, respectively.We would like to perform a pairwise comparison of the high dose group with the control.The null hypothesis 0 3 0 against alternative For exact test we have t obs,pairwise1 = t obs,pairwise2 = x 0 d 0 + x 3 d 3 = 15.Table 3, given below, shows all possible values of T pairwise1 along with their table probabilities and the right tail probabilities for pairwise comparison of high dose with control calculated from data after dropping low and medium dose groups using SAS proc Stratify.
Table 4 given below shows all possible values of T pairwise2 along with their table probabilities and the right tail probabilities for pairwise comparison of high dose with control calculated from all data using the scores 0, 0, 0 and 3 in SAS proc Multtest.
The results from Tables 3 and 4 show that both the table-and right-tail probabilities for the two pairwise exact    tests may go in either direction.For example for the observed number of 0, 2, 3 and 5 tumor bearing animals, we have t obs,pairwise1 = t obs,pairwise2 = x 0 d 0 + x 3 d 3 = 15, and the p-value after deleting the low and medium dose groups is p pairwise1 = 0.0281, and that using data from all dose groups is p pairwise2 = 0.0729 i.e. the p-value after deleting the low and medium dose groups is smaller than the p-value using data from all dose groups.On the other hand if the observes number of tumor bearing animals were 2, 2, 3, and 3, then t obs,pairwise1 = t obs,pairwise2 = x 0 d 0 + x 3 d 3 = 9.The p-value after deleting the low and medium dose groups would be p pairwise1 = 0.5, and that using data from all dose groups would be p pairwise2 = 0.4763.In this case the p-value for pairwise exact test after deleting the low and medium dose groups would be larger than the p-value for the pairwise exact test using the data from all dose groups.

SIMULATION STUDY OF POWER CALCULATION
Consider a carcinogenicity study with four treatment groups namely, control, low, medium, and high dose groups each with 50 animals, and dose scores 0, 1, 2, and 3, respectively.The power was calculated for different choices background incidence rate in the control group (p 0 ).The incidence rate for the high dose group (p 3 ) was then chosen by a certain increment (δ) over p 0 .The incidence rate for the low dose group (p 1 ) and that for medium dose group (p 2 ) were calculated using a logistic model as follows: with d i = i and i = 0, 1, 2, and 3.The values of the power were calculated by finding the percentages of times the null hypothesis was rejected when the alternative was true in a simulation with 1000 loops.Table 5 shows the calculated power using the asymptotic normal approximation and Figure 1 gives the graphical representation of the results.Table 6 shows the calculated power using the exact test and Figure 2 gives the graphical representation of the results.
The simulation results show that asymptotic normal test T pairwise2 is always a more powerful compared to T pairwise1 or T pairwiseHB .The two tests T pairwise1 and T pairwiseHB have similar power (as sample sizes are taken to be same).The pairwise exact test using data from all dose groups has more power compared to test based on data deleting the two middle dose groups for small values of p 0 and δ.In this paper, we discussed the topic of pairwise comparison of the high dose group with control in a typical carcinogenicity study.We proposed two tests procedure, one based on data only from the two dose groups to be compared and one based on data from all dose groups.We elaborated both exact and normal approximation version of our proposed tests.Through a simulation, we compared the power performances of these tests.For the comparison of high dose group with control group in a typical four dose group carcinogenicity study, the simulation results showed that the power of the asymptotic normal test using data from all dose groups is asymptotically more efficient and hence is always more powerful than that of the test using data from high and control groups only.For exact test, neither of the two tests showed uniformly better power than the other.The pairwise exact test using data from all dose groups showed more power than that of the test based on data deleting the two middle dose groups for tumor types with low background rate and/or drug with small carcinogenic effect, while the pairwise exact test using data from all dose groups showed less power than that of the test based on data deleting the two middle dose groups for tumor types with high background rate and/or drug with large carcinogenic effect.However, since a test that drops part of the data is asymptotically less efficient, we recommend that for the pairwise comparison one uses tests that use the data from all dose groups.

2 . 4 .
Exact Pairwise TestWe now consider the derivation of the exact pairwise tests.Following Mehta et al., the exact p-value for unconditional exact test T exact,pairwise1 based on the data from the Group r and Control, for testing * 0 H , is cal-For pairwise comparison of Group r and control (r = 0) with c 0 = -1 and c r = 1, this test statistic is culated as

Figure 1 .
Figure 1.Graphical representation of power vs. delta for given p 0 using normal approximation for pairwise comparison of control and high dose group.

Figure 2 .
Figure 2. Graphical representation of power vs. delta for given p 0 using exact test for pairwise comparison of control and high dose group.

Table 1 .
Asymptotic and exact p-values showing significantTrend with non-significant pairwise comparisons at α = 0.05.

Table 2 .
Asymptotic and exact p-values showing non-significant trend with significant pairwise comparison at α = 0.05.
. H 1 :p i ≤ p i+1 for all i with strict inequality for at least one i.
can be used.These methods are based on the quantiles of multivariate normal distribution taking care of the correlation into account as the package MVTNORM.For pairwise comparison of the highest dose Group r

Table 3 .
Pair comparison of control with high dose group after deleting the low and medium dose groups.

Table 4 .
Pairwise comparison of high dose with control using all data.

Table 5 .
Power calculated using the normal approximation for unconditional, conditional, and Hothorn Bretz tests.

Table 6 .
Power calculated using the exact test for unconditional and conditional tests.