^{1}

^{*}

^{2}

^{3}

^{1}

Introduction: This work investigates whether to conduct a medical study from the point of view of the expected net benefit taking into account statistical power, time and cost. The hypothesis of this paper is that the expected net benefit is equal to zero. Methods: Information were obtained from a pilot medical study that investigates the effects of two diagnostic modalities, magnetic resonance imaging (MRI) and computerized axial tomography scanner (CT), on patients with acute stroke. Statistical procedure was applied for planning and contrasting equivalence, non-inferiority and inequality hypotheses of the study for the effectiveness, health benefits and costs. A statistical simulation model was applied to test the hypothesis that conducting the study would or not result in overall net benefits. If the null hypothesis not rejected, no benefits would occurred and therefore the two arms-patterns of diagnostic and treatment are of equal net benefits. If the null hypothesis is rejected, net benefits would occur if patients are diagnosed with the more favourable diagnostic modality. Results: For any hypothesis design, the expected net benefits are in the range of 366 to 1796 per patient at 80% of statistical power if conducting the study. The power depends on the monetary value available for a unit of health improvement. Conclusion: The statistical simulations suggest that diagnosing patients with CT will provide more favourable health outcomes showing statistically significant expected net benefits in comparison with MRI.

As a general rule private and public researchers in medicine and health care, such as medical or pharmaceutical companies, centers for research and development, can be assumed to decide whether or not to carry out a study on the basis of the anticipated net benefits of health improvements obtained by a product or new medical indication. Health system organizations, link (or should link) their decisions to the expected future benefits in terms of patient’s health improvements and cost savings. All parties, however, are likely to consider the cost of reaching the expected benefits, such as the cost of the study, and they are usually interested in shortening the study duration as much as possible, since this will mean that an effective diagnosis or treatment will be available sooner thus contributing to health improvement and increasing the time of marketing their product exclusively and, hence, the product’s lifecycle benefits. Obtaining statistically significant evidence of the superiority, equivalence or non-inferiority of a given modality of diagnosis and treatment in relation to other modality increases their chances of their application in clinical practice.

Optimal clinical studies design will avoid unnecessary use of resources and increase benefits. Some knowledge and information, gathered from the execution of studies, can positively contribute toward improving the results of such trials. A clinical trial can be viewed as an uncertain experiment whose design depends on the problem being addressed. Some studies fail to answer the questions that need to be addressed [1-3] due to design issues, which mean that the resources used are wasted. Conventional statistical methods for designing clinical studies are widely used to make decisions, basically, on the sample size or power requirements used to test the hypothesis that there is a difference between two options. These models do not take into account the future costs and benefits of the decisions that for example might follow study findings; instead they rely on arbitrary decision criteria. Other methods have been used with similar objectives [4-8], but the authors applied those assuming deterministic relationships and so only point results could be assessed, which do not take into account variability and uncertainty involved in the decisions.

Patel and Ankolekar [

A variety of models have been published focusing on the cost and benefit but with different objectives. Baker and Heidenberger [

In Section 2, we first present a brief description of a conducted pilot clinical study results, and then we present a brief background of existing statistical procedures for testing hypotheses and at the same time to re-estimate the resulting power for a given design and sample size. Then, giving the information obtained from the pilot trial we construct the expected net benefits model that allow helping making decisions of the clinical study at design stage. The variability and uncertainty of the expected net benefits was quantified estimating the probability distributions. The model is executed many times to simulate simultaneously all hypotheses testing of the trial and of the expected net benefits. In Section 3, we present the results of the simulation model. In Section 4 we discuss the methods and the results of this application.

Information were provided from a pilot randomized medical trial that compares the overall consequences in term of effectiveness and health benefit of two diagnostic options, CT and MRI option as the initial diagnostic test for patients with suspected acute stroke [

In order to quantify the effectiveness at the end of the study (three months after they have been diagnosed and treated, the categorical variable resulting from using the Rankin scales were converted into 3 health states for each patient as follows: 1) levels 0 - 2 were considered as an independent health state in which patients are assumed to have normal life, 2) scale levels 3 - 5 were considered as a dependent health state in which patients are assumed to need health care, and 3) scale level 6 corresponds to the death health state. The results of the 130 patients showed that 87 of them had been diagnosed with CT, from which a proportion of 0.506 (p_{2}) were in an independent health state; the remaining 43 had been diagnosed with MRI, from which a proportion of 0.429 (p_{1}) achieved an independent state.

The results of the effectiveness of the two diagnostic options are shown in

The difference in effectiveness, 0.077, illustrates that using CT provided a more favourable outcome. The hypothesis is tested applying (1), the estimated value (0.82) is lower than the upper percentile of α, z_{α} (for α = 0.05, z_{α} = 1.96), and therefore there was no evidence to reject the null hypothesis that there is no difference in effectiveness between the two groups.

The health benefits were quantified by converting the Rankin scale into an indicator of quality of life that ranges from −0.02 (patient is worse than dead state) to 1 (patient is alive and in very good state) of patients in two moments: at the time of admission to hospital (baseline) and after three months (end of study) follow-up. The conversion of the Ranking scale levels into a quality of life index was obtained according to the results of previous studies done by Fagan et al. [

The costs of the trial were: 1) the treatments costs that

could be avoided if all patients were moved to the less costly diagnostic option, that is, the difference between the cost of treatment under CT and MRI (C_{TC}-C_{MRI}); these costs were estimated from the data collected in the original study (see

The pilot trial’s results were analyzed before the planned sample was recruited and the outcomes were evaluated. It happened because the initial results indicate that effectiveness associated with CT is higher than MRI, the time results obtained from the recruited patients so far suggested that further investigation’s expected duration to reach statistical significance in effectiveness exceeds the time originally planned for the whole research. Based on this initial results, there was no change in the actual practice (patients continued be diagnosed with MRI or CT), no additional research expenditures were needed but there will be no benefits from adopting the best option taking into account their effectiveness, health benefits and cost saving. However, the money invested in the study was somehow wasted, as no useful information was used to provide evidence on the best option. The decision made may not be totally accurate since not rejecting the null difference is not evidence of null difference. Moreover, the non-significant result might indicate that the inequality design of the trial was not appropriate, and might have been designed as equivalence or non-inferiority designs in effectiveness as primary outcome and gathering information on health benefits.

Suppose that a protocol for a double blind randomized clinical trial is designed in order to compare the effects of two diagnostic products (two diagnostic images), MRI and CT on patients with a given disease. In order to test on whether or not the effectiveness of the two products is or not equal, in the case of comparing two proportions, two-sided hypothesis versus are planned assuming α and β. A sample of patients, N, recruited from the population were assigned randomly either to MRI or to CT, in which the probabilities that an individual has a successful outcome is designated as p_{1} and p_{2}, respectively. As soon as the data are available from the trial for evaluation, a test statistic can be applied to compare two proportions of success. As in inequality design, using the test statistic with the information of the pilot trial we would obtain the observed value of the statistic as follow:

where,

If the resulting value of is higher than z_{1-α/2} (=1.96, α = 0.05) or p-value is smaller than α, the null hypothesis is rejected with the conclusion that CT provides more favourable outcome. If the resulting value of is lower than −z_{1−α/2} (= −1.96, α = 0.05) or p-value is smaller than α, the null hypothesis is rejected with the conclusion that MRI provides more favourable outcome. If the resulting value of is lower than −z_{1−α/2} and higher than −z_{1−α/2} (1.96, α = 0.05) or p-value is higher than α, the null hypothesis is not rejected with the conclusion that there is not enough evidence that one of the associated diagnostic provides more favourable outcome.

After providing the success probabilities obtained from the pilot trial, the sample size of n_{1} and n_{2} that are necessary for the new trial can be calculated using formula 2 taking into account α, β or power.

Suppose that n_{1} = 650, n_{2} = 650 (assuming n_{1} = n_{2}), p_{1} = 0.429, p_{2} = 0.506 (obtained from the Trial). Applying (1) = 2.8 (p-value = prob (Z >) = 0.0026) indicates that there is enough evidence to reject the null hypothesis at significant level of 0.0026. The power of the trial, 1 − β, is expected to be as initially planned (e.g. 80%):

Now, suppose that (n_{1} = 490) <> (n_{2} = 980) and applying (2), the expected power of 0.8 would be reached assuming unequal allocation as follow:

Suppose that a protocol for a double blind randomized clinical trial is designed in order to compare the effects of two medical products (two diagnostic images), MRI and CT on patients with a given disease. In order to test on whether or not the effectiveness of the two products is equivalent, in the case of comparing two proportions, the equivalence hypotheses and

are planned. A sample of patients, N, recruited from the population were assigned randomly either to MRI or to CT, in which the probabilities that an individual has a successful outcome is designated as p_{1} and p_{2}, respectively. As soon as the data are available from the trial for evaluation, a statistical test can be applied to compare two proportions of success. Thus, using the test statistic with the information of the pilot trial we would obtain the observed value of the statistic as follow:

Since > z_{1-α/2}, or p-value is smaller than α the hypothesis of inequivalence is rejected. Applying the two sided confidence interval for the observed difference assuming acceptable margin of difference Δ, if the resulting two limits of the confidence interval lie within the range [−Δ, +Δ], the hypothesis of inequivalence is rejected, and otherwise, is accepted.

Suppose that n_{1} = 730, n_{2} = 730 (n_{1} = n_{2}), p_{1} = 0.429, p_{2} = 0.506, Δ = 0.15, applying (2) the power is expected to be:

Now, suppose (n_{1} = 550) <> (n_{2} = 1100) and applying (2), the expected power of 0.8 would be reached assuming unequal allocation.

Suppose that a protocol for a double blind randomized clinical trial is designed in order to compare the effects of two medical products (two diagnostic images), MRI and CT on patients with a given disease. In order to test on whether or not CT would provide worse outcome than MRI, in the case of comparing two proportions, the one sided non-inferiority hypotheses and are planned. A sample of patients, N, recruited from the population was assigned randomly either to MRI or to CT, in which the probabilities that an individual has a successful outcome is designated as p_{1} and p_{2}, respectively. As soon as the data are available from the trial for evaluation, a test statistic can be applied to compare two proportions of success. Using formula

(2), we will be able to calculate the sample size or power for equal and unequal allocation.

Suppose that n_{1} = 578, n_{2 }= 578 (n_{1} = n_{2}), p_{1} = 0.429, p_{2} = 0.506, Δ = 0.15, Applying (1) assuming non-inferiority hypothesis.

the hypothesis of inferiority is rejected. Using the two sided confidence interval for the observed difference assuming acceptable margin of difference Δ. If the resulting low limit of the confidence interval is higher than the negative margin of [−Δ, +Δ], the hypothesis of the inferiority is rejected, and otherwise, is accepted.

Applying (2) the power is expected to be

Now, suppose (n_{1} = 420) <> (n_{2} = 860) and applying (2), the expected power of 0.8 would be reached assuming unequal allocation.

The analysis addresses whether from a health system’s perspective it would make sense to conduct a clinical study, considering the expected net benefit (ENB) for the health system. We assume that the cost for the health system would consist of the cost of completing the study, while the benefits are defined as the health benefits and cost savings that would be accrued, assuming a statistically significant result was reached. The analysis, however, does not consider the potential administrative costs of research or any cost related to implementing the decision of changing the diagnostic patterns in line with the study recommendations. Moreover, the CT is a dominant option from a cost-effectiveness perspective, it is less costly (see

The ENB hypothesis also requires an assessment in monetary terms of the future health benefits likely to be obtained by using the superior diagnostic modality on all patients, which in the context of this work the health benefits were quantified by estimating the gain in utility as the difference between the two measures of utilities corresponding to CT and MRI. The resulting difference was multiplied by a cost, which is the amount of willing to pay per unit of quality of life. The amount is approximately 30,000 monetary units that have often been assumed as a reasonable cost-effectiveness threshold for accepting a new medical device in the health system [_{1 }+ n_{2}), recruited from the population were assigned randomly either to MRI or to CT, in which the probabilities that an individual has a successful outcome is designated as p_{1} and p_{2}, respectively. As soon as the data are available from the trial for evaluation, testing this hypothesis, we estimate the ENB that would result from integrating the power estimated with time, costs and health benefits. If ENB is lower than z_{1-α/2} and higher than−z_{1−α/2} (z_{1−α/2} = 1.96, α = 0.05) or p-value is higher than α, the null hypothesis is not rejected with the conclusion that there is not enough evidence of net benefits, otherwise, the hypothesis is rejected and we conclude that it there is statistical evidence of net benefits using MRI or CT diagnostic modalities. If the ENB is positive then the CT provides net benefits, otherwise, MRI. The ENB test is based on the following model:

where, w is the monetary value of a unit improvement of utility, U_{CT} and U_{MRI} are the utilities values of CT and MRI option respectively, C_{CT} and C_{MRI} are the cost of diagnosing a patient with CT or MRI option respectively, lambda is the rate of enrolled patients. The total time of the trial as a function of the number of patients, the number of centers and the rate of enrolment, that is

. Thus, the total cost of research is calculated as the multiplication of the total time by research cost per patient.

To put a simple example, suppose that estimated power = 0.12, U_{CT} − U_{MRI} = 0.05, C_{CT} − C_{MRI} = 100, for simplicity let

Per patient, w = 30,000, then ENB = −32. The ENB is negative which means that there will be a loss if the CT is applied. However, if 0.8 of power is reached, the ENB is positive (926 monetary unit) which means that a net benefits are expected if patients are diagnosed with CT for any hypotheses design.

In order to test the hypothesis planned that are or not a net benefit would derived from the study, the probability distribution of ENB is needed. Simulation will allow carrying out the trial as if it comes from real world experiments, given the statistical design and extracting the costs and health benefits from their respective distributions, and then to estimate probability curve of positive expected net benefits.

Following the construction of the model as shown in

The states of the simulation model are:

1) Population: the population from which a sample of patients will be selected.

2) Arrivals: patients will be admitted to the study site such as labs, hospital or other research units.

3) Inclusion: patients will be included at random according to criteria of inclusion that is modeled in probabilistic pattern.

4) Diagnosis test: the included patients in the trial are assigned randomly to one of the two diagnoses, CT or MRI.

5) Follow-up states: following the diagnostic, the health state of each patient will be valued and classified into one of three health states three months later: independent health state (indep. life), dependent health state (dep. life) and death state (empty). Those patients who died before the end of follow-up are excluded from the analysis since there were no data on the cause of death.

The simulation is run in three steps:

1) Individual simulation (one patient)

The model starts when a patient from the population arrives to the study centre according to an exponential arrivals time. It is screened for subsequent inclusion or exclusion. If the patient is included then s/he is randomly assigned to either MRI or to CT. After that, the patient is randomly moved to one of the three health states (indep. life, dep. life, empty). All patient movements are controlled by their respective probabilities and the random probabilities generated from a uniform distribution between 0 and 1.

2) Sample of patients simulation In order to estimate variability between patients, the previous individual simulation was repeated for a given number of patients that are selected from the population to participate in the trial. When a number of patients are included in MRI and in CT (n_{1} and n_{2}), the simulation is stopped and the number of excluded patients, the number of patients assigned to MRI and CT, proportion of patients in each model state depending on MRI and for CT, are obtained. The cost and benefit for this patient were randomly chosen assuming normal distribution of cost and health benefits. So that the resulting difference in effectiveness, utilities and cost of treatments were calculated.

3) Replications In order to estimate variability between samples, the second step was repeated 10,000 times, applying 10,000 different sequences of random numbers. Consequently, 10,000 different results were generated. So that ten thousands differences in effectiveness were compared according to the corresponding hypothesis testing. Differences in mean cost of treatment and in mean health benefits between MRI and CT are also obtained for each execution.

The hypothesis contrast, power and expected net benefits are processed as follows:

a) For each replication, the statistical value for the inequality hypotheses design was estimated dividing the observed mean differences in effectiveness by the expected standard error. If the calculated statistic is higher than the absolute value of z_{1−α/2}, or the p-value is smaller than α, we rejected the null hypothesis, otherwise, we accepted it.

b) For each replication, the statistical value for the equivalence hypotheses design was estimated assuming Δ = 0.15 is an acceptable margin of difference for equivalence design. If the calculated statistic lies outside the area of [−z_{1−α/2}, +z_{1−α/2}], we reject the null hypothesis of non-equivalence, otherwise we accept it.

c) For each replication, the statistical value for noninferiority design was assuming also Δ = 0.15 is an acceptable margin of difference for equivalence hypotheses. If the calculated statistic lies outside the area of [−z_{1−α/2}, +z_{1−α/2}], we rejected the null hypothesis of inferiority, otherwise we accepted it.

d) In all these hypothesis design, the power is calculated as the number of rejecting the null hypothesis divided by the total number of replications (in this work they are 10,000 ENBs). Once the expected power of the trial is obtained, the expected net benefits are calculated for each replication. The 10,000 ENBs obtained allow constructing the empirical distribution of the mean, and testing the hypothesis of this work, subsequently, the power of ENB is calculated as the number of rejecting the that the ENB = 0 divided by 10,000.

tients with MRI or CT (p > 0.05). The null hypothesis of the expected net benefits is rejected since for any design the statistical significance level lies below 5% at a power higher than 80%. Moreover, the probability of rejecting the null hypothesis increases as the amount of monetary amount of utilities increases, in which case diagnosing patients with CT would be beneficial for the patient and the hospital that would apply it (

Simulations results are similar to the analytical ones, confirming the accuracy of power and the expected net benefits estimated by the statistical procedure. Moreover the parameters estimated with simulation that we know they can be also available with conventional statistical methods, however, the standard error of ENB and standard deviation (within and between groups variability) are estimated, and thus the accumulated probability distribution of ENB could be obtained. In the case of hypothesis testing, there is a significant evidence that the expected net benefits are higher than zero since that the resulting value of observed statistical test is higher than the reference one t_{1−α/2 }(for α = 0.05, t_{0.975}) at an estimated power of more than 80%. Hence, the probability of benefits moving patients to the CT or MRI diagnostic modality could be taken into account for different trial designs.

Medicine and health care are of our primary concerns. The overall objective of our group is to improve the health of patients. More specifically, with the help of scientists, technologists, developers and industry, our aim is to help private and public institution in improving disease diagnosis and to predict the most optimal treatment pattern for patients. Conducting a research study, in first instance, required collecting a sample of patients that are the main objects of research. Adequate design and sample size of clinical or medical studies would yield to obtain sufficient evidence from the expected results.

The overall expected net benefits of a study is an issue, in this work, the statistical and the simulation processes were applied to plan to a new trial taking into account data from conducted pilot trial. The simulation model addresses the issue of whether any benefit would be derived from the medical trial carrying out a larger one based on the pilot data in order to test the hypothesis. We tested some possible hypotheses designs such as inequality, equivalence, non-inferiority. In addition, simulations provided us with information on variability of complex expected net benefits model, allowing for constructing its probability distributions. These results were obtained in two steps. First, we modeled the results of the pilot study; we extend the study to larger sample sizes. Second, we repeatedly tested the hypothesis of the simulated study for a given sample size, to assess whether or not to reject the null hypothesis of any design, inequality, equivalence and non-inferiority estimating the empirical probability distributions of the expected net benefits.

From statistical perspective, with the simulation we were able to provide information on the expected net’s benefits variability. Concretely, testing the hypothesis of

the expected net benefits needed that variability to be estimated, marriaging variability from a range of different probability distributions, assuming independency. These variables are, the enrollment time per patient that follows the inverse of a probability exponential distribution and the mean time distribution of inclusion that follows a normal distribution, the number of patients that reach the primary outcome of the trial that follows a binomial distribution and the mean that follows a normal distribution, the differences in utility within and between samples that were extracted from two normal distributions, the overall cost per patient of the treatment that assumed to follow a normal distribution, the overall differences in cost of treatment between the two studied arms of the study that were extracted from a normal distribution. Marriaging all possible values of each combination; and uploading them to the expected net benefits function generate the empirical distribution of ENB. Having estimated its mean and standard error ended in the construction of the statistic that used to test our hypothesis.

From probabilistic perspective, with simulation we are able to apply this procedure to a range of monetary amounts that a hospital would receive from the health system for a unit of a patient health improvement, or that the health system would pay for a unit of health benefit. Assuming that the trial has shown statistical significant assuming α and β, a probability curve can be generated such as the one produced with this study. Although the approach in this paper applied to one site trial; however, it can be applied to a multisite trial assuming robustness or sensitivity on one or more variables. In the robustness case, the underlying assumption is that the number of sites does not change the effect on health such as the effectiveness or utilities, but might increase or decrease the cost per patient. Under the sensitivity assumption, including more sites will produce different health effects and might change other variables, therefore, the procedure will marriage them within the overall health effects.

From generality perspective, the simulation model is also can be applied taking into account any hypothesis testing. For example, trials that study a continuous variables and employ t-test for two means: trials that study dependency between several variables applying chisquare, studies that compare two variances to confirm homogeneity or heterogeneity, analysis that study correlation or dependency between two variables, or trials that test statistical association between a given continuous response and a given factor.

In conclusion, the clinical trial was designed with the hypothesis that MRI is more costly than CT but would show better health outcomes. According to the results of the pilot trial, the CT is a dominant option from a costeffectiveness perspective; it is less costly and has a better health outcome than MRI. However, the pilot study did not provide evidence statistically significant showing the favorability of any of the two diagnostic images. The simulated trial for larger sample size shows that for any hypothesis design the MRI and CT will generate the same cost (p > 0.05) but the CT provides better health benefits (p < 0.05). Furthermore, we have shown that the expected net benefits per patient will be higher than zero statistically significant if CT is used for diagnosing patients with suspected acute stroke. Therefore, we can conclude that CT works better than MRI.

Finally, we would like to communicate that our research is being applied to research studies in order to search for optimal trial’s parameters design that will maximize the expected net benefits resulting from testing the hypothesis assuming two types errors of a (the probability of rejecting a hypothesis when it is true), and β (the probability of rejecting alternative hypothesis when it is true) or the power (the probability of accepting a hypothesis when it is true). In our on-going research project, it is trying to apply the optimization approach to aneurysms data.

Lots of thanks to the reviewer for his grateful comments.