A Comparison of Within-Subjects and Between-Subjects Designs in Studies with Discrete-Time Survival Outcomes

Crossover designs are well-known to have major advantages when comparing the effects of various non-curative treatments. We compare efficiencies of several crossover designs along with the Balaam’s design with that of a parallel group design pertaining to longitudinal studies where event time can only be measured in discrete time intervals. With equally sized sequences, the parallel group design results in the greater efficiency if the number of time periods is small. However, the crossover and Balaam’s designs tend to be more efficient as the study duration increases. The degree to which these designs add efficiency depends on the baseline hazard function and effect size. Additionally, we incorporate different cost considerations at the subject level when comparing the designs to determine the most cost-efficient design. Researchers might consider the crossover or Balaam’s design more efficient if the duration of the study is long enough, especially if the costs of applying the baseline treatment are higher.


Introduction
A well-known type of outcome in longitudinal studies is the survival endpoint with the research interest focused on the occurrence and timing of some events.
The timing of events can be measured continuously using thin precise time units (e.g.minutes or days).Under some circumstances, however, it is far less feasible to measure the timing of events this precisely.Instead, it is measured discretely using a set of discrete intervals like years, months or weeks.Here, an event might occur at any point in time during an interval but is only measured once at the end of each interval until event occurrence, drop-out or the end of the trial.
Since the time of event occurrence is rounded upward to the nearest measurement time point, a loss of information will occur if the exact time is unknown.
Data with this type of survival endpoints are called discrete-time survival data as opposed to data recorded on a continuous scale, i.e. continuous-time survival data.
It is useful to measure time discretely in retrospective studies where subjects can only supply event times in ranges or round numbers due to memory failure.
In a suicide ideation study, for instance, subjects may not remember the exact day of their first suicidal thought, but may remember how old they were at the time.Discrete-time survival data are also encountered in prospective studies where it may not be feasible or practical to follow subjects continuously.In a smoking initiation study, for example, researchers are not able to contact subjects every day to record the onset of smoking, but may do so on a regular basis, say once a month.Another reason for measuring event occurrence in discrete time is if events can only occur at a few points in time, e.g. a student can graduate from college on a few occasions in the academic year.
Optimal designs and statistical power analysis have been important tools in designing longitudinal studies with a variety of types of outcomes.For trials with discrete-time survival outcomes, [1] [2] have recently studied the optimal combination of the number of subjects and measurements per subject to achieve a sufficient power at a minimal cost or to maximize the power level for a fixed budget.These papers solely focus on randomized controlled trials with a parallel group design with subjects receiving only a single treatment in the course of the trial.However, in studies evaluating a new and promising treatment, it may be considered unethical to not offer the treatment to some of the subjects as is done in parallel group trials.In addition, if subjects do not receive the treatment during the study, they are more likely to withdraw from the study [3].Here, crossover designs are a more efficient option than parallel group designs for comparing the effect of treatments.
Crossover designs are powerful designs in bioequivalence, clinical and pharmaceutical trials if the disease is chronic and treatments have a reversible or noncurative effect.The major advantage of crossover over parallel group designs is that crossover designs eliminate part of the inter-subject variability from the treatment comparisons, and thus might require fewer subjects to provide the same level of power.For a discussion on the analysis of crossover designs with continuous and binary outcomes, see [4] [5] [6].Moreover, the work of [7] discusses sample size determinations in crossover designs with binary outcomes in subjects measured until the end of the study even if they experience the event in an earlier period.
Crossover designs have rarely been applied with right-censored survival data [8].Nevertheless, survival analysis can be used in crossover trials with survival outcomes.An example is a crossover study to compare disease-free survival among postmenopausal women with receptor-positive early breast cancer [9].
Another example is a two-period crossover design comparing atenolol with a combination of atenolol and nifedipine to treat angina pectoris [10].In these studies, the time-to-event endpoint was measured continuously and a continuoustime survival model was used to analyze the data [11].On rare occasions, parallel group designs have been compared with crossover designs in studies with continuous time-to-event outcomes [12].However, to the authors' knowledge, no longitudinal studies at all have been conducted on this subject with discrete-time survival data.
The aim here is to determine whether and if so, to what extent, a crossover design is more efficient than a parallel group design with discrete-time survival outcomes if subjects are not further observed after experiencing the event.An example of a crossover design with discrete-time survival endpoints is a fertility study was conducted to study the effect of ovarian stimulation on increasing the chance of conception [13].The outcome was timing of pregnancy that was recorded discretely after each treatment cycle.Here, a proportional odds model was used for analysis repeated crossover outcomes.We compare the designs' efficiencies for different numbers of time periods, allocation proportions to treatment sequences, baseline hazard probabilities and treatment effect sizes.We assume the main objective of the trials is to compare the treatments, and the best design is the one that provides an efficient estimate of the treatment differences.
We consider the most common AB/BA crossover design where subjects switch to the other treatment after one time period.For practical purposes, other variations of this design are also considered such as the AABB/BBAA design where subjects alternate the treatments after multiple time periods or applications of a given treatment.The efficiency of these designs are also compared with that of the Balaam's design [14], which is a combination of the crossover and parallel group designs.It is logical that these studies are affected by dropout if subjects leave the study permanently due to unforeseen reasons rather than event occurrence.We thus compare the designs with and without attrition.
The organization of this paper is as follows.In the next section, an overview of the logistic regression model for analysing discrete-time survival data is presented; see [15] for an extensive discussion of this model.This section is followed by an introduction of the various designs and the optimality criterion.Section 4 reports on the results.The comparison between different designs is illustrated with an example in Section 5.The final section presents the conclusions and discussion and gives suggestions for future work.

The Statistical Model
We consider designs with two treatments A and B, and s sequences of treatments.Let The baseline measure is taken at time 0 0 t = , just before randomization, and the total duration of the follow-up of a study with p periods ends at time p t p = .
Note that 0 0 t = is the "beginning of time" when no one has experienced the event yet but everyone is eligible to do so.The first measurement of event occurrence is taken at time point is measured once at the end of each time interval and defined according to whether the subject experiences the event of interest ( ) or not ( ) with the time-dependent explanatory variable ijk Z denoting the treatment condition and 1 ijk Z = if the subject receives treatment B, and 0 = otherwise.For a given time period, the parameter β denotes the effect of treatment B relative to treatment A on the probability of event occurrence on the logit scale, so ( ) ( ) ( ) ( ) . As can be seen, the parameter β is constant across time.We assume that model ( 1) is a proportional odds model.The dummy variable ijk D is set to 1 in time interval j and 0 elsewhere.The corresponding intercept parameter j α is the value of the logit hazard pro- bability corresponding to treatment A in that particular time period so ( ) ( ) Model (1) can be formulated in matrix form as: where the vector ( ) h t contains discrete-time hazard probabilities of event occurrence for all p time periods and all N subjects until they experience the event or leave the study before event occurrence or the study concludes (i.e., if j p = ).The parameter vector ( ) , , , , unknown parameters.The design matrix X is of order ( )

∑∑
, with jk n representing the number of subjects in the k th sequence entering the j th period and leave the study neither due to event occurrence nor unforeseen reasons prior to time period j .The total number of subjects at the beginning of the study in sequence k is 1k k n n = , and the total number of subjects entering period is the estimate of the discrete-time hazard probability, and is the estimate of the probability of the subject will experience the event after time j t .The notation [ ] refers to the treatment in the preceding period, i.e. the ( ) 1 j − th time period.It can be concluded that the risk of event occurrence in period j depends on the survival probability then and in the previous period using r t denotes the proportion of subjects in sequence k who leaves the study during time period h due to reasons other than event occurrence.In this study, we assume a constant attrition rate across all the time periods and treatment sequences, i.e. ( )  .We assume non-informative attrition (i.e.missing at ran- dom), that is the non-censored subjects do not differ systematically from the censored subjects.This means those who remain in the study are representative of everyone who would have remained in the study had there been no censoring.
The common method for estimating the vector of unknown parameters θ is iteratively re-weighted least squares [16].The asymptotic variance-covariance matrix of the estimator θ has the form: The vector jk X corresponds to subjects in the j th time interval in the k th sequence, and has ( ) elements with value 1 on the j th element, value 0 or 1 on the ( ) , and the ( ) is the least squares weight for subjects in period j under sequence k .For a logit link function, it is given as . It should be noted that the ( ) Ĉov θ is proportional to the variance of the estimator of the treatment difference (i.e., β ) and it will be used for the definition of the optimal designs.

Crossover Designs and Efficiencies
We consider trials with a maximum duration of max 12 p = time periods where subjects may be observed over one or more multiple periods of using a given treatment.In the CO1 design, subjects use the two treatments sequentially for fixed periods of time, and switch to treatment A or B after one time period.In the CO3 design, subjects alternate the use of the two treatments after three applications of a given treatment so the switching time point is three.The PG and CO designs are based on two treatment sequences in which some part of the subjects are randomly assigned to the first sequence and the remaining subjects to the second sequence.The last design is the Balaam's (BM) design, a four-treatment-sequences design that assigns some parts of the subjects to the (AB/BA) sequence and the remainder to the (AA/BB) sequence.This design may be considered a combination of the PG and CO1 designs.It should be noted that we compare the designs of equal total duration or follow-up time ( ) p and sample size at baseline ( ) We study the efficiency of the PG design compared with that of an alternative design to determine which design estimates the parameter β more efficiently.
To do so, we consider the PG design as the reference design and compare the performance of the other designs using the relative efficiency (RE):

Results
We assume the probability of event occurrence for treatment A does not vary across the time intervals, so ( )  .Since finding a closed-form formula for the variance-covariance matrix in ( 2) is complicated, the results are presented for selected choices of A h and the difference between the probabilities of treatments A and B ( ) − .We study the efficiency and cost efficiency of a PG design in comparison to three alternative CO designs, namely CO1, CO3, and CO6, along with the BM design.
Figure 1 presents the REs on the vertical axis as a function of the number of time periods p on the horizontal axis for various values of A h (rows in matrix of graphs) and δ (columns in matrix of graphs).These selected values of A h and δ result in a maximum difference of 50% in the survival probabilities between two treatment sequences by the end of a study with the maximum duration if a PG design is conducted.Here, the total number of subjects in each design ( ) As can be seen in Figure 1, all the designs are equally efficient if 1 p = since all the designs are the same in this case (see Table 1).In addition, the CO3 design is as efficient as the PG design if 3 p ≤ and the CO6 design is as efficient as the PG design if 6 p ≤ .We also observe that the REs of the CO and the BM designs generally decrease from unity as p increases from 1 and the size of the decrease depends to some extent on the A h and δ values.The decrease is smaller if A h is larger for a given δ , and for a given A h , it is larger with a larger δ .However, at some value of p , the REs start to increase and approach unity if p increases further.They may exceed unity if p becomes even larger.The CO and BM designs are thus less efficient than the PG design if p is small, though the designs may become more efficient than the PG design if the duration of the trial is large enough.Of all the designs, we observe that the BM design more often tends to become more efficient than the PG design for a larger p than the CO designs.Figure 1 also shows that a more extreme result is given if δ becomes larger for a given A h or A h becomes smaller for a given δ .We observe almost similar results if a constant attrition rate r is taken into account.The only difference is that the REs approach unity more gradually as p increases if 5% or 10% of the subjects are lost to follow-up in each period within each sequence, implying that the CO and BM designs require more time periods to be equal or more efficient than the PG design (results not shown).
We now look for the time point when the CO1 and BM designs are as efficient as the PG design.We have p  denote the smallest number of time points when . For computational and practical reasons, we limit our search to sixty periods.Table 2 presents the value of p  for various com- binations of A h , δ and r .For the BM design, we observe that for a given δ , p  decreases as A h increases, and similarly it decreases as δ increases for a given A h .In addition, the decrease in p  accompanying an increase in A h is larger with a smaller δ .Likewise, the decrease in p  with an increase in δ is larger if A h is smaller.We note that the same effect of δ on p  is not ob- served for the CO1 design if 0.1 A h = .For each combination of A h and δ , the CO1 design requires a longer study duration than the BM design to become more efficient than the PG design.The table also shows that if 5% of the subjects drop out of the study in each period, only a few if any more periods are required for the CO1 and BM designs in comparison with the case of no attrition, and if r increases further, the designs need to be expanded to include even more periods (results not shown).Lastly, we emphasize that the value of p  for the CO3 design is almost similar to that of the CO1 design, but the study duration for the CO6 design needs to be extended for one to three more time periods (results not shown).

A Cost-Efficiency Comparison between Designs
Previous section shows a pair-wise comparison of efficiency of the designs for studies with 1 to 12 time periods.In such a comparison, we did not make a distinction between the costs of sampling subjects and the costs of treating and measuring them.However, if recruiting a subject costs differently than taking measurements from that subject and the cost of treating this subject with treatment A is different than that with treatment B, we should account for the cost differential when we compare the five types of the designs.To this end, we take two cost functions as a function of the number of time periods for each type of the designs into account.Let 0 c represent the initial cost for setting up a study.If 1 c denotes the cost to include a subject in the study, let the cost of taking one measurement be denoted by 2 c .If A c denotes the cost to treat a subject with treatment A, B c denotes the cost to treat a subject with treatment B. Cost function I is then computed for a study with p time periods, s  treatment sequences, and N subjects at baseline as follows: With this cost function, we assume that subjects leave the study once they have experienced the event and measurements are not taken after event occurrence.
[ ] ( ) Therefore, the number of measurements (including one baseline measurement) for each subject in sequence k is given by [ ] ( )

∑∑ ∑∑
To determine the most cost-efficient design for a given number of time periods, we normalize the optimality criterion (i.e. Figure 2 presents the REs plots as a function of p for deigns with an equal allocation proportion for three combinations of A h and δ (columns in matrix of graphs).We consider three different combinations for the costs 1 c , A c , and B c (rows in matrix of graphs).It should be mentioned that in all three cases the costs at the subject-level are higher than the cost at the measurementlevel (i.e. Figure 2 shows that when adjusting for design cost, the PG design is often a more efficient choice when treating subjects costs more than recruiting the subjects and also when treatment A is less costly than treatment B (i.e. ), the efficiencies of the CO designs and the BM design is very close to that of the PG design, and the efficiency of these designs tend more often to exceed unity as A h and δ become larger.In the last scenario, when sampling subjects is less costly than treating subjects and the cost of treatment B is lower relative to the cost of treatment A, the CO and BM designs are most often more efficient than the PG design.In this case, the CO and BM designs are always preferable.
Overall, we observed very similar results to those of Figure 2 when comparing the design efficiencies using cost function II ( ) 0 r > .However, the increase in the efficiencies with increasing p becomes smaller in cases where 0 r ≠ (results not shown).
Up until now we have focused on equal allocation proportions for each of the treatment sequences of the designs.From a clinical or ethical point of view, there might be reasons for an unequal assignment of subjects.For the BM design, for example, it might be considered unethical to give subjects the same treatment multiple times if its efficacy is unknown [18].In this paper, we define [ ] 0,1 π ∈ as the design allocation proportion and assume for a given π that the CO and PG designs randomly allocate N π subjects to the first treatment sequence and the BM design randomly allocates 2 N π to the (AB/BA) sequence.We now compare the designs efficiency as a function of π for a given p .We limit our search to [ ] 0.25,0.75π ∈ and assume that the treatment sequences contain at least a quarter of the subjects.Figure 3 depicts efficiency comparisons across the designs as a function of π under the same condition as Figure 2 if 12 p = .We first focus on the results for 1 which implies treatment B is more costly than treatment A. When 0.05 , and the PG design is more efficient than the CO designs as π increases further.If A h and δ increase to 0.2, the REs of the CO designs become closer to that of the PG design which implies the effect of π becomes smaller and thus a negligible gain in efficiency from any CO or PG design is obtained.As can be seen, the RE line of the BM design has almost a similar ∪ - shape.This makes sense since the BM design is a compromised design between the CO1 and the PG designs when 0.5 π = .For a smaller π , the BM design allocates more subjects to the sequences of the PG design and therefore its efficiency is in favor of the efficiency of the PG design for a small π .Similarly, the BM design allocates more subjects to the sequences of the CO1 design for a large π and therefore its efficiency is in favor of the efficiency of the CO1 in this case.So the BM design tends to be more efficient than the PG design as the sequence sizes become more unequal.
As treatment B becomes as expensive as or less expensive than treatment A (i.e. .In this case, it is seen that the BM design becomes even more efficient among all the designs as π increases.

Example
In the introduction, an example of a CO design with right-censored survival outcomes is given by a study investigating the effect of using controlled ovarian hyper-stimulation on the probability of conception via intrauterine insemination (IUI) [13].A total of 74 couples with male sub-fertility are randomized to IUI in a natural cycle or to IUI in a cycle with ovarian stimulation.couple is given a total of six treatment cycles, three with IUI in natural cycles and three with IUI in cycles with ovarian stimulation.The couples alternate the treatments according to a CO1 design.The primary outcome measure is the pregnancy rate over the cycles.The study reports the pregnancy rates per completed cycle after IUI in either treatment.We use the rates in the ovarian stimulation cycles as the probability of con-ception in each cycle for the current treatment.We presume there is a newly developed treatment expected to further improve the efficacy of IUI in increasing the probability of conception, and the difference between the two treatments on the logit scale is 0.5 β = .Figure 4  However, the RE of this design depends on design cost when 0.5 π ≠ .

Discussion
The present study is designed to compare the efficiency and cost efficiency of the crossover (CO) design and Balaam's (BM) design with that of a parallel group (PG) design in trials with discrete-time survival endpoints.We consider designs with two treatments A and B and focus on how efficient a design is for estimating differences between the treatment conditions.We consider CO designs that differ in the number of time periods after which subjects switch to the other treatment.All the calculations are performed in R and our R syntax is available upon request from the first author.
Using this efficiency comparison, our study shows that the efficiency of estimating treatment differences can be increased by a proper choice of the design.Deciding on whether the CO and BM designs are more efficient than the PG design depends on the size of true treatment differences ( ) δ , the baseline hazard probability ( ) A h , and on the study duration ( ) p .This depends also on whether or not the efficiency comparison is penalized by the amount of costs that a design has and whether or not attrition is taken into account.In general, we find that if the treatment sequences are equally sized, the CO and BM designs are less efficient than the PG design if p is small, and a larger gain in efficiency may be obtained using the CO or BM designs instead of the PG design if p is larger.The effect of a prolonged study duration on the efficiency of the CO and BM designs is larger if δ and A h are larger.We also observe that the BM design requires fewer time periods than the CO designs to become as efficient as the PG design.The CO and BM designs are either as efficient as or more efficient than the PG design when treatment B costs less or the same as treatment A. In cases where the baseline treatment is more expensive, the PG design is most often more efficient.In addition, all the designs perform almost equally well if the treatment sequences are of almost equal sizes for a given number of time periods.In studies with unequal allocation proportions, the BM design is preferable.
A similar comparison between a CO design and a PG design can be seen in the work of [12], where the outcome is a continuous-time survival endpoint.In general, they conclude that using the CO design might result in an efficiency gain.They focus on designs with two treatments and only two periods with subjects switching from one treatment to the other halfway through the study.
They also study the optimal switch point, which in their case is at one-fifth of the total study length.In our study, we assume the total study length is fixed beforehand and subjects switch from one treatment to the at a change point of one, three or six periods.Our results seem to show that the total study length and total amount of design costs play an important role in determining when a CO or BM design is more effective.The BM design generally results in a smaller loss in efficiency or provides a greater efficiency if it is used instead of the PG design.The CO designs are more preferable as the study duration becomes longer, or the more effective treatment is as expensive as or less expensive than the baseline treatment.
In the current study, we confine our focus to a model where the subject effects are fixed.However, if a design is efficient under the fixed effect model, it will also perform well under the random subject model [6].We also limit the use of the designs to situations where assumptions of no sequence, period or carryover effects are valid.Since we study a random assignment of subjects to the sequences, the assumption of no sequence effect is not unrealistic.Moreover, the plausibility of the assumption of no carryover effect can be heightened by including an effective washout period between any two consecutive time periods.Nevertheless, the extent to which our findings are true if these assumptions are in doubt deserves to be explored further.What is more, our findings are based on a constant attrition rate across the time periods and treatment sequences.However, our R syntax is also suitable for unequal attrition rates across time periods and sequences.Another subject of future research might be the effect of baseline covariates on the optimal designs of within subject designs, as was studied in [19] with a parallel group design.The degree to which their results apply to trials where subjects receive different treatments over time deserves further study.

Conclusion
In conclusion, the possible advantages of a CO design compared to those of a PG design have been previously addressed in longitudinal studies with a variety of outcomes including the survival outcome.A similar investigation of the discretetime survival data where the event time can only be measured on a discrete scale instead of a continuous scale has yet to be conducted.Our study provides additional findings on the usefulness of the CO and BM designs over the PG design if the treatment effect, baseline hazard function and number of time periods are varied.
the total number of subjects in the design, with k n the number of subjects randomly assigned to sequence k at baseline.The underlying continuous event times are recorded in discrete time intervals indexed by 1,2, , j p =  .These intervals represent a series of consecutive periods in continuous time with equidistant cut points and 0 elsewhere.So the first p elements re- present the values on the dummies 1 2 , , , p D D D  is equally divided over the treatment sequence groups.Each group in the PG and CO designs contains 2 N subjects and in the BM design, each group contains 4 N subjects.

Figure 1 .
Figure 1.Efficiency of selected designs with two treatments A and B in comparison with the PG design as a function of the number of time periods p for various A h and δ given equal sample sizes when treatment sequences are equally sized.
under sequence k by the end of time period j and 0 0 t = is the baseline.

β
other words, we compare the designs based on: under each design is penalised by the amount of costs of that design which accounts for the number of time periods and for different costs of treatment A and B. We compare the designs for different combinations of the costs at the subject-level (i.e. 1 c , A c and B c ).The costs at the measure- ment level is fixed to 2 1 c = .
is more expensive than sampling subjects and the costs to treat a subject with treatment B are high in relation to the costs to treat the subject with treatment A. The second combination 1 the three costs at the subject-level are equal.Finally, the last combination 1 B A c c c < < represents a reverse scenario compared to the first combination where application of treatment B is less expensive than treatment A.

Figure 2 .
Figure 2. Efficiency of selected designs with two treatments A and B in comparison with the PG design as a function of the design allocation proportion π for various A h and δ and the cost ratios 1 c , A c and B c using cost function I in Equation (3) ( ) 0 r = when treatment sequences are equally sized.
. The efficiency of the other alternative designs, however, tend to increase as p becomes larger so that the BM design becomes more efficient than the PG designs, on the other need to be conducted for a longer time to become more efficient than the PG design; the CO1 and CO3 designs become more efficient for 9 p ≥ when 0.2A h δ = =.When all the subject-level costs are equal (i.e.
, a higher efficiency is most often maintained by the CO designs compared to the PG design as π becomes larger, especially if

Figure 3 .
Figure 3. Efficiency of selected designs with two treatments A and B in comparison with the PG design as a function of the design allocation proportion π for various A h and δ and the cost ratios 1 c , A c and B c using cost function I in Equation (3) ( ) 0 r = when p = 12.
that an event occurs in interval j under sequence k for subject i given that the event has not yet occurred before period j .
1, 2, , p =  or max p time periods.So max p is the maximum number of time periods a trial can be conducted in and p is the number of time periods at hand.Note that the larger p is, the longer the duration of the follow-up.For easy comparison of the hazard probabilities in each period, we assume the time points are equally spaced and the distance between any pair of adjacent time points is fixed in advance.Under this assumption,

Table 2 .
The number of time periods REs of the CO designs are larger than unity if π is small; the designs are almost equally efficient if