CIMAvax ® EGF vaccine therapy for non-small cell lung cancer : A weighted log-rank tests-based evaluation

Time-to-event has become one of the primary endpoints of many clinical trials. Comparing treatments and therapies using time-to-event (or “survival”) data requires some care, since survival differences may occur either early or late in the follow-up period, depending on various factors such as the initial potency or the duration of efficacy of the drugs. In this work, we investigate the effect of the CIMAvaxEGF vaccine therapy on the survival of patients with non-small cell lung cancer, using stratified and unstratified weighted log-rank tests. Weighted log-rank tests are designed to identify early and late survival differences between treatments. Using these tests, we conclude that the vaccine is more efficient than the standard therapy among patients less than 60 years of age.


INTRODUCTION
The Center of Molecular Immunology (CIM) is one of the centers of the Scientific Pole in Cuba devoted to the research, development, and manufacturing of human biotechnological products.The CIMAVax ® EGF vaccine, developed at CIM. Investigating the effect of the CI-MAVax ® EGF vaccine on patients with NSCLC can be based on comparing the survival functions under CI-MAVax ® EGF and a control therapy.The log-rank test is the classical tool that comes to mind for such an analysis.However, this test is not appropriate for detecting a delayed separation of the survival curves that may occur due to some late effect of one of the treatments.Previous studies suggest that such an effect exist for the CIMA Vax ® EGF vaccine.Moreover, the log-rank test is useful when each treatment group is homogeneous, in the sense that the survival distribution is the same for every patient in the group.Again, previous studies suggest that an evaluation of CIMAVax ® EGF efficacy should be stratified over age, since homogeneity only holds within the two subpopulations of patients under (respectively over) 60 years of age.Stratified weighted log-rank tests, such as the stratified Fleming-Harrington's family of tests, can be used to deal simultaneously with the issues of late effects and stratification.In this work, we apply these tests to survival data arising from two clinical trials that were conducted to evaluate the CIMAVax ® EGF vaccine in patients with NSCLC.The first study is a finished phase II trial that included 80 patients, the second is an on-going phase III trial including 356 patients.Both trials were randomized and controlled with two treatment arms, one arm receiving the CIMAVax ® EGF vaccine and a standard therapy, the other (control group) receiving only the standard therapy.In both trials, the primary endpoint of interest was the overall survival, measured as the duration between inclusion in the trial and death of the patient.

PURPOSE
The purpose of this work is to analyze survival data from patients with NSCLC with standard therapy com-pared with patients vaccinated with CIMAVax ® EGF.

Study Design and Treatment
A phase II clinical trial including 80 patients (under a balanced design), and a Phase III trial including 356 patients (under an unbalanced design 1:2 and still ongoing) are analyzed, first separately, and then by combining the data from both trials.Both trials are controlled, with two treatment arms: one group received the CIMAvax ® EGF vaccine plus standard therapy and the other the standard therapy.Based on previous studies, the statistical analyses were stratified according to the age of the patients (the patients under 60 years were assigned to a stratum, the patients over 60 years to another stratum.In the sequel, these strata are respectively referred to as "younger" and "older").Table 1 provides a brief description of the data.The overall survival, defined as the duration between inclusion in the trial and death was the primary endpoint of interest.Some other variables were also assessed but their analysis falls beyond the objective of the present.The ethics boards of all the participant institutions approved the protocols, and all the patients provided a written informed consent.The data were collected, managed, and analyzed at CIM and CENCEC.

Eligibility Criteria
Included patients had histologic or cytological evidence of NSLC (Adenocarcinome and Non Adenocarcinome), ECOG performance status 0, 1, or 2, stage IIIb and IV, and adequate hematologic, renal, and hepatic functions.

Weighted Log-Rank Tests for Two or
More Samples We consider the problem of comparing the hazard rates of K (K ≥ 2) treatment groups that is, we consider the testing problem: versus H A : at least one of the h j is different from the others for some t ι  where h j (t) is the hazard rate in the j-th group and  denotes the largest time at which some patients are still at risk in each group.The alternative hypothesis is global in the sense that one rejects the null hypothesis if at least one of the populations differs from the others.The available data for solving this testing problem consist of independent durations, possibly rightcensored, obtained from the K treatment groups.In the sequel, we let 1 2 D t t t     denote the distinct death times in the K pooled groups, d ij be the number of deaths at time t i in the j-th group, and Y ij be the number of patients at risk at t i in the j-th group ( j 1, , K be the numbers of deaths and patients at risk in the combined K groups at time t Weighted log-rank tests of H 0 are based on weighted differences between the Nelson-Aalen estimators of the cumulative hazard rates in the K groups and the Nelson-Aalen estimator obtained in the pooled groups that is, under H 0 (see [1,2], for example).Using data from the j-th group, the hazard function can be estimated by d ij /Y ij .If the null hypothesis H 0 holds, an estimator of the common hazard rate is the pooled groups estimator d i /Y i .Now, W j (t) be a positive weight function for the j-th group.This weight function is chosen so as to detect early or late differences between the treatment groups.Finally, the weighted log-rank statistic for testing H 0 against H A is defined as: If all the Z j () ( j 1, , K   ) are close to zero, then there is little evidence to believe that the null hypothesis in (1) is false, whereas if one of the Z j () is far from zero, then there is evidence that the j-th treatment group has a hazard rate differing from that expected under the null hypothesis.Although the mathematical theory allows for general weight functions in (2), in practice, all the commonly used test statistics have weight W j (t i ) = Y ij W(t i ), where W(t i ) is a common weight shared by the K groups.Z j () then becomes: In this case, Z j () can be interpreted as the sum of the weighted differences between the observed numbers of deaths and the expected number of deaths under H 0 in the j-th sample.The variance of Z j () in ( 3) is given by: and the covariance of Z j () and Z g () is: The quantities are linearly dependent since j 1, K   is zero.Therefore, the test statistic is constructed by selecting any K − 1 of the j


Z s  (the first K1, say).The estimated variance-covariance matrix of the resulting vector is given by the (K − 1) x (K − 1) matrix  formed by the appropriate jg .Finally, the test statistic is given by the quadratic form: If the null hypothesis H 0 is true and the sample size is large, X is approximately distributed as a chi-square with K − 1 degrees of freedom.An α-level test of H 0 thus rejects the null hypothesis when X is greater than the upper α-quantile of this chi-square.In particular, when K = 2, as is the case in our data set, X should be distributed as a chi-square with 1 degree of freedom under H 0 .
A variety of weight functions have been proposed in the literature (see [4][5][6][7][8], and [3] for a review).The most common and widely used test has W(t) = 1 for all t.This test is referred to as the Mantel-Haenszel or log-rank test, and is available in any modern statistical software.It has optimum power to detect alternatives where the hazard rates in the K treatment groups are proportional to each other.
Fleming and Harrington proposed (see [3]) a very general class of tests that includes the Mantel-Haenszel test as a special.Let Ŝ(t) be the Kaplan-Meier estimator of the common survival function under H 0 , based on the combined treatment groups.The weight function in the Harrington-Fleming's test is, at time t i : Here, the survival function at the previous death time is used as a weight for mathematical reasons (this ensures that these weights are known just prior to the time at which the comparison is to be made).Letting p = q = 0 in (4) results in the Mantel-Haenszel test.Letting p = 1 and q = 0 results in a version of the Mann-Whitney-Wilcoxon test.When p > 0 and q = 0, W p,q give the most weight to early departures between the hazard rates in the K groups, whereas when p = 0 and q > 0, the corresponding tests give most weight to departures which occur late in time.By an appropriate choice of p and q, one can construct tests which have the greatest power against alternatives where the K hazard rates differ over any desired region.
We applied this methodology to our data sets.Fleming-Harrington test (with p = 0.5 and q = 0.5) is more sensitive to detect differences when the curves have a delayed separation in time that is why sometimes the results are significant.Mantel-Haenszel test is appropriate when there is a proportional separation of curves.

Stratified Test
As mentioned above, the log-rank tests test is useful when each treatment group is homogeneous that is, when the survival distribution is the same for every patient within a group.A violation of this homogeneity usually indicates that one needs to adjust the analysis for some other (than the treatment group) covariate.For example, previous studies suggest that an evaluation of CIMA Vax ® EGF efficacy should be stratified over age, since homogeneity only holds within the two subpopulations of patients under (respectively over) 60 years of age.One possible approach to this issue is to base the decision on a stratified version of one of the tests discussed above.This approach is feasible when the covariate we adjust for is categorical and its number of levels is not too large, or when it is continuous but can be discretized into a workable number of levels.In the sequel, we discuss how such stratified tests are constructed, and how they can be used to analyze our data.
Suppose that the covariate we need to adjust for is discrete (or continuous and discretized), with M levels.Then, we wish to test the hypothesis for s 1, M and t ι against the alternative that at least one of the h js is different from the others for some s and some t  .A stratified test is constructed similarly as in ( 2) and (3) (for the weighted version of the test), except that all quantities are calculated by using only the data from the s-th stratum, yielding Z js () and  s .The same weight functions as in the previous section can be used for the stratified tests.
A global test of H 0,strat in ( 5) is obtained by summing all the within-stratum quantities, such as: Finally, the stratified test statistic is defined as where  is the (K − 1) x (K − 1) matrix obtained from the ŝ jg 's.If the null hypothesis H 0,strat in ( 5) is true, and the sample size is large, strat is approximately distributed as a chi-square with K − 1 degrees of freedom.An α-level test of H 0 thus rejects the null hypothesis when strat is greater than the upper α-quantile of this chi-square.In particular, when K = 2, as is the case in our data set, strat should be distributed as a chi-square with 1 degree of freedom under H 0,strat .X X X

RESULTS
We analyzed the data obtained from the phase II and phase III trials described above, using the methodology described in the previous section.
It was first analyzed both trials separately, and then performed a single analysis by combining both data sets (such a combination is appropriate here, since both studies had similar characteristics: inclusion and exclusion criteria, schedule of treatment, •••).
We performed the Mantel-Haenszel test and the Fleming-Harrington test with p = 0.5 and q = 0.5.We used the stratified versions of both tests, and refining the results by testing the hypothesis of no differences between CI-MAVax ® EGF vaccine and standard therapy within each stratum.
The results are summarized in Table 2 (for the phase II trial), Table 3 (for the phase III trial), and Table 4 (for the combined data).
In Table 2 it is observed the median of survival for both groups of phase II study (one patient with missing data).The younger patients that received the vaccination has the highest value (10.47 months) while the rest of patients did not reach more than 7 months.
When age is not taken into account in the stratified   P values for the older stratum were non-significant for both approaches.
Table 3 shows the results of Phase III trial.In case of the Mantel Haenzel test p value was significant and again the youngest people have an advantage (approximately 4 months) if they receive the vaccine with the overall survival greater than those patients in the standard therapy.

OPEN ACCESS
Regarding the analysis of Non proportional hazard rate p values were non-significant.
In Table 4  There is an advantage regarding median values for the patients under 60 years (5 months with a significant clinical and statistical relevance) From the Figure 1, the survival curves in the CI-MAvax ® EGF vaccine group and standard therapy group diverge (at least in the younger stratum) after some time has elapsed, which suggests that a Fleming-Harrington test with q > 0 (that is, for detecting a delayed difference) is appropriate.
It is observed that survival curves from the younger stratum are clearly separated, so the vaccinated group has an advantage over the group that only received the standard therapy while for the older patients for both groups of treatment the benefit is the same.In all the survival curves regarding the youngest patients vaccinated (the last column of Figure 1) it is observed that the separation of the curves occurs early in time however patients over 60 years the effect of the vaccine is seen later (central column of the figure)

CONCLUSION
According to the results of the finished phase II trial, we conclude that the group that received the CIMAVax ® EGF vaccine has a better response in the younger stra-tum.The analysis of the phase III trial data also corroborates these results which contributes to obtain the sanitary registration of this vaccine.When both studies phase II and III are combined, we also infer that the vaccinetion with CIMAVax ® EGF is more efficient in younger subjects since the median survival was of eleven months which is a remarkable figure for patients with NSCLC.

Table 1 .
Disposition of patients for Phase II and Phase III clinical trials.

Table 2 .
Comparison of the results using two different approa-ches for Phase II study.

Table 3 .
Comparison of the results using two different approaches for Phase III study.

Table 4 .
Comparison of the results using two different approaches for combined trials.pvalues are non-significant but in the stratum analysis for the younger patients receiving CIMAVax ® EGF and for both methods p values are less than 0.05.It is observed that there is a survival advantage for younger patients with this vaccine.
* p < 0.05; ** p < 0.005; O: Older, Y: Younger, V: Vaccine, C: Control.F igure 1. Kaplan Meier Survival curves for phase II, III and combined trials in each of the strata.model the combined data of Phase II and Phase III studies are shown, for both methods performed without taking into account the age p values are less than 0.05, and considering the age, younger stratum is benefit if they are vaccinated with CIMAVax ® EGF.