Parametric and Non-Parametric Analysis of the Survival Times of Patients with Multiple Myeloma Cancer

Multiple myeloma (MM) is a type of cancer that remains incurable. In the last decade, most research into MM has focused on investigating the improvement in the therapeutic strategy. Our study assesses the survival probability of 48 patients diagnosed with MM based on parametric and non-parametric techniques. We performed parametric survival analysis and found a well-def-ined probability distribution of the survival time to follow three-parameter lognormal. We then estimated the survival probability and compared it with the commonly used non-parametric Kaplan-Meier survival analysis of the survival times. The comparison of the survival probability estimates of the two methods revealed a better survival probability estimate by the parametric method than the Kaplan-Meier. The parametric survival analysis is more robust and efficient because it is based on a well-defined parametric probabilistic distribution, hence preferred over the non-parametric Kaplan-Meier. This study offers therapeutic significance for further enhancement in the treatment strategy of multiple myeloma cancer.


Introduction
Multiple Myeloma (MM), also known as Kahler disease, myelomatosis, and plasma cell myeloma is a type of cancer that starts from malignant plasma cell (Specifically the white blood cell) [1]. As part of the human immune system is antibodies produced by the plasma cell which fight against germs and other sub-chemicals as associated with the cause of MM [7] [8] [9] [10]. There have been some treatment recommendations for multiple myeloma focused on decreasing the clonal plasma cell population and consequently decrease the symptoms of disease [11]. A preferred treatment like high-dose chemotherapy, commonly with bortezomib-based regimens, and lenalidomide-dexamethasone followed by autologous hematopoietic stem-cell transplantation (ASCT), the transplantation of a person's stem cells have been recommended for MM patients under 65 years [12]. In 2017, a meta-analysis performed has shown that post-ASCT maintenance therapy with lenalidomide has improved the progression-free survival and overall survival in persons at standard risk [13]. Whereas in 2012, it was found from a clinical trial that intermediate and high-risk disease patients benefit from a bortezomib-based maintenance regimen [14].
Statistically, approximately 30,000 new patients are diagnosed with MM in the United States (U.S.) every year, making it the second most common hematologic malignancy in the U.S. [15]. In 2019, a report by the Surveillance, Epidemiology and End Results (SEER) Cancer Institute reported that of all new cancer cases in the U.S for MM constitutes 1.8% and ranked among the top 14 list of cancer diseases [3]. A further projection by SEER indicates 32,110 estimated new cases of MM and an estimated 12,960 MM patients are expected to die. Those figures are scary and intriguing and cannot be overlooked. There is a sufficient increase compared with the 24,050 estimated new MM cases reported in 2014 [16]. The identified risk factors of MM are reported to be common among the black race, families with MM history, and being a male [3] [17]. SEER cancer institute reported from 2012-2016 that 63.1% of all races and sexes of MM cases are aged 65 or greater.
Though multiple myeloma cancer disease remains incurable, most researches into MM focused on how to improve the survival times of patients diagnosed with MM. The Kaplan-Meier (KM) method has been popularly used for analyzing cancer survivorship data in recent times due to the simplicity of its usage. It is often used to compare the survival difference of observations/groups based on the log-rank test of the null hypothesis that there is no difference. KM is mostly used for longitudinal studies like a cohort study [18]; an example is the present study (i.e. the survival time of patients diagnosed with multiple myeloma). Brain et al. [19] used Kaplan-Meier to test whether there was a significant difference in the overall survival duration between the categories of risk factors based on the generalized Wilcoxon test and the log-rank test. They found a significant difference in the survival duration between MM patients with LI% < 1% (i.e. low percentage labeling index) and LI% ≥ 1% (i.e. high percentage labeling index). Also, there was a significant difference in the survival duration for MM patients with the number of DNA synthesizing (S) values < 1.0 × 10 11 and S values ≥ 1.0 × 10 11 . Shaji K. Kumar et al. [20], used the Kaplan Meier to test for the significant difference of the overall survival from the time of post-transplantation relapse between MM group treated subsequently with one or more of the newer drugs (thalidomide, bortezomib, or lenalidomide) and those not exposed to the newer drug, and they found a significant difference between the two groups.
In the present study, we developed a parametric and non-parametric survival analysis of the survival time of patients diagnosed with multiple myeloma. We believe that being able to find the unique/correct probability distribution or probabilistic behavior that characterizes the survival time is a great step towards getting a good and accurate prediction of the survival duration. It is well known that parametric analysis is more powerful in decision analysis than its non-parametric counterpart. Feigl and Zelen ([1965] p. 835) and other authors have pointed out that the assuming exponential distribution works well for studying the survival of cancer-related cases [21]. However, almost every data given on any cancer survival problem may have an associated well-defined probability distribution. Hence, assuming an exponential distribution for a given cancer survival case without any further investigation is a serious mistake that will lead to making incorrect decisions. We compare the more powerful parametric analysis of the survival time to the commonly known non-parametric Kaplan-Meier analysis of cancer survivorship.

Data Description
The data used in this research is from West Virginia University Medical Center provided by Harley [22] [23]. Originally, the data constituted survival times of 72 multiple myeloma (MM) patients diagnosed and treated with alkylating agents [22]. 65 out of 72 patients provided complete data for 16 contributing variables (risk factor) whiles the remaining 7 were eliminated due to missing data in at least one of the 16 risk factors. Thus, the survival or death times of the MM patients is driven by the 16 risk factors. Given that a patient is diagnosed with myeloma, the 16 risk factors were recorded, and the time up to which the patient survived the disease was noted (called the survival time from diagnosis to the nearest month). Of the 65 patients, 48 and 17 were dead and alive, respectively.
In the present research, we utilized the complete data of the 48 patients with the survival times for our analysis.
Before we proceeded to perform the parametric analysis of the survival times of patients with multiple myeloma, we wanted to know whether there is a difference in the survival times for gender, i.e. male and female. Given that we have a small data of only 48 patients, we used the log-rank test [24] to compare the difference in survival times of male and female. From Figure 2, the log-rank test resulted in a large p-value = 0.45, indicating a failure to reject the null hypothesis  proceeded with parametric analysis without considering stratification based on gender.

Parametric Analysis of the Survival Times of Multiple Myeloma
Multiple Myeloma (MM) has been, and continues to be, the subject of many research studies. The main goals of these studies are to investigate the means of improving the therapeutic/treatment and survival of MM patients. Nearly 1 to 5 per 100,000 individuals are affected by MM each year worldwide with a higher incidence in the West [25]. However, their approach is statistically less robust or not as powerful compared to parametric analysis if a pdf can be found. On the contrary, if parametric analyses are used prematurely (i.e. assumed a pdf), the result will be misleading.
In the last few decades, scientific transformations have made it possible to employ parametric analysis to data which initially lack a unique probability distribution, particularly for skewed data. For instance, ANOVA is a parametric analysis widely recommended for comparing statistical models and means of different data sets. ANOVA assumes normality, homoscedasticity, and random independent samples. Several suggestions on transformations have been pro- often been applied to skewed data to achieve the normal distribution. Giampaolo Merlini et al. [28] used log transformation to reduce asymmetry variables to a low level and avoid obvious outliers. Performing parametric analysis is an imperative approach to achieve good statistical analysis and inference by knowing pdf of the data.

Descriptive Statistics of the Survival Times of Multiple Myeloma
We plotted the histogram to investigate the distribution of the survival time of multiple myeloma of our data as shown in Figure 3. We can see that the distribution of the survival time of MM is right-skewed. Table 2 displays the descriptive statistics of the survival time of MM. If a skewed value is negative (less than zero) implies the data depict left or negative skewness, if a skewed value is positive implies right or positive skewness [29]. So, the positive skewed value of 1.41 in Table 2 is further evidence to support the right-skewed behavior as shown in Figure 3. Statistics like Kurtosis allow for the assessment of highly extreme values in the data. A positive kurtosis value demonstrates a leptokurtic behavior of the distribution, and a negative value displays a platykurtic behavior of the distribution. A kurtosis value of zero implies the distribution behavior is mesokurtic [30]. Thus, the kurtosis value of 0.78 in Table 1 attests to the leptokurtic in the survival time data of MM.

Three-Parameter (3P) Lognormal Probability Estimation of the Survival Times of Patients with Multiple Myeloma
After the assessment of Figure 3, and the descriptive statistics in Table 1, we find that the probability distribution of the survival time of MM follows the 3-parameter lognormal distribution. Table 2 shows three different goodness-of-fit tests of the 3p lognormal probability distribution of the survival times of the MM patients given by the Kolmogorov-Smirnov, Anderson-Darling and Chi-square test. Each of the tests resulted in a large p-value, indicating that we fail to reject the null hypothesis, 0 H : the probability distribution follows the 3p-lognormal. In this section, we defined the probability density function (pdf) of the 3p-lognormal distribution and estimate its parameters. Given the survival time of MM, t as a random variable, then the pdf of the 3p-lognormal probability distribution is given by, where 0 σ > denotes continuous shape parameter, µ −∞ < < ∞ represents the continuous scale parameter and γ is the continuous location parameter. If 0 γ ≡ yields the 2p-lognormal distribution. To estimate the three parameters σ, µ and γ, we would apply the maximum likelihood estimation (MLE) procedure. The MLE estimates the parameters of the probability distribution by maximizing the likelihood function. Generally, MLE is the most used parameter estimation method for statistical inference due to the robustness compared to other traditional methods like the method of moment estimation; the logic is intuitive and flexible [31]. It has some unique and important statistical properties like consistency, invariant, efficiency, sufficiency and asymptotic normality.
The MLE for three-parameter lognormal PDF may not be asymptotically efficient, it is still regarded as a better parameter estimation technique than the method of moments or quantiles because the variance is much smaller [32]. Therefore, the MLE is considered the best parameter estimation method for the 3p-lognormal probability distribution. Calitz (1973) suggested that the MLE procedure by Cohen (1951) should be adopted because it works better than other procedures. To compute the MLE estimators γ, µ, and 2 σ , we first find the likelihood function. The likelihood function of the random sample of n independent observations of survival time To find the estimates ˆ, γ µ and σ of , γ µ and σ , Cohen (1951) We find μ by equating the partial derivative of  with respect to µ to zero and solving for µ , we have We find σ by equating the partial derivative of  with respect to σ to zero and solving for σ .  [33] found the localized estimate γ to be sufficient in the identification of the 3p-lognormal. The validation of the estimates from such procedure has since been investigated by Calitz (1973), Cohen and Whitten (1980) [34], and Chen (2006) [35] and others. Cohen's (1951) procedure estimated the LMLE for γ by substituting μ and σ from Equations (4) and (5) into Equation (6) to obtain γ . Thus, we have To solve for γ in Equation (7), we can solve iteratively the local maximum likelihood estimate (LMLE) of γ to obtain γ . Here, we take into consideration Open Journal of Applied Sciences admissible roots for which γ is below ( ) 1 t . Cohen and Whitten (1980) found only one of such roots; however, in the event of multiple admissible roots, choose the root that results in the closest agreement between the sample mean t and the expected value of the 3p-lognormal probability distribution, Base on the procedure for the parameter estimation of the three-parameter lognormal probability distribution discussed above, the 3p-lognormal probability distribution of the survival time t of multiple myeloma (ˆˆ, , γ µ σ ), the approximate estimates are given in Table 3 below. We substituted the estimates into Equation (1) to obtain the probability density function (pdf) of the 3p-lognormal distribution of the survival time of multiple myeloma given by, The above findings that the survival times of multiple myeloma patients data follows three-parameter lognormal distribution can ensure efficient and accurate analysis of the survival times of MM patients. Given the pdf in Equation (1), we find the cumulative density function (cdf) by taking the integral of the pdf with respect to the random variable t, given by: Given the tediousness in integrating the 3p-lognormal pdf, we adopted the method of integration by substitution; substituting into Equation (9) the standard normal cdf given by In Equation (9), we substitute    where ( ) represents the standard normal cdf at a given survival time i t . Substituting the estimates of Table 3 into equation (10) denotes the standardized normal cumulative probability distribution.
The cdf can be useful in determining the probability of a given random observation (survival time t) would be less than or equal to some value T; thus, ( ) P t T ≤ . Figure 4 shows the cdf plot of the survival times of multiple myeloma patients data.
The survival function can be used to estimate the probability that a patient diagnosed with multiple myeloma would survive beyond time T; thus, ( ) P t T > . In

Kaplan-Meier Estimation of Survival Probability of the Survival Times of Patients with Multiple Myeloma
Kaplan-Meier estimate (KM) was introduced by Edward L. Kaplan and Paul , where i t is the time when at least one event happened, i τ denotes the number of events (e.g. deaths or alive) at time i t , i n represents the individuals/observations at risk (not yet had an event or censored) up to time i t , and i i n τ denotes the probability of survival. In survival analysis, a nearly universal feature data is the censoring data, the commonest of which is the right-censoring where an individual expires or removed before the end of the study or clinical trial. The other cases of censoring which rarely happen are the left-censoring (the initial time at risk is unknown or individuals removed at the start of the study), and interval-censoring (a case of both right and left censored). A key advantage of the KM curve is that it can take into account censored data, particularly right-censoring, and is easy to estimate. The analysis of Kaplan-Meier survival curve takes into consideration the following three assumptions: 1) it assumes that censored observations have the same prospects of survival as those who continue in the Open Journal of Applied Sciences study, 2) it assumes survival probabilities to be the same for individuals recruited early and late in the study, and 3) it assumes that the event happens at the specified time. We utilized KM to assess the overall survival of MM. In Figure 6, we show the KM curve for the global estimates of the probability of survival times of patients diagnosed with multiple myeloma with a 95% confidence interval. We can see that all the MM patients demised by time 92 (in months). Though the KM estimator is the most commonly used technique of survival analysis, and is useful for examining recovery rates, the probability of death and effectiveness of treatment. However, as we mentioned before the KM method is not as powerful as the parametric survival analysis for decision making. We cannot obtain the hazard function to estimate the rate at which patients die with MM using KM.

Comparison of Three-Parameter Lognormal with the Kaplan-Meier Estimation of the Survival Function
In the parametric analysis, we found the survival times of patients with MM follow the three-parameter lognormal probability distribution. In section 2.5, we performed a non-parametric analysis using the Kaplan-Meier to estimate the survival probability of the MM patients. Now, we compare the survival estimate of the 3p-lognormal with the Kaplan-Meier survival estimate of the survival times of the MM patients. The relevance of the survival function of the two methods is to estimate the survival probability of a patient diagnosed with MM beyond a given time, after a given treatment. The survival times along with the survival probabilities of the two survival functions is given by Table 4. We see that the probability estimates by the 3p-lognormal survival function are mostly higher than that of Kaplan-Meier. Though there are times in which the KM estimates higher probabilities, the 3p-lognormal survival function from the parametric probability distribution estimates the survival proportion more accurately than the Kaplan-Open Journal of Applied Sciences Meier. Thus, because parametric methods are more powerful, robust and efficient than non-parametric methods.

Discussion
Given the danger posed by multiple myeloma (MM) cancer in recent years, it is imperative to investigate the prognosis and how to enhance the therapeutic/ treatment strategy of MM. While MM cancer currently remains incurable, there has been some improvement in the treatment process. The treatment progress has been largely due to the introduction of therapeutic agents such as thalidomide, lenalidomide, and bortezomib, as well as the introduction of high-dose chemotherapy and stem-cell rescue (ASCT). The quest to improve the survival time (or increase the death time) of patients with MM has resulted in adopting several different research approaches and methodologies.
In the present study, 1) we identified a well-defined probability distribution that characterizes the survival times of the 48 patients diagnosed with MM and Firstly, we investigated utilizing the log-rank test to test the difference be-tween the survival times of the males and females diagnosed with MM. We found that the survival times of both males and females diagnosed with MM are not different, so we performed the analysis using the combined data of the males and females. We then found a well-defined probability distribution that characterizes the survival time of the 48 patients diagnosed with MM follows the 3p-lognormal probability distribution. We believe that being able to find the best probability distribution that characterizes the probabilistic behavior for a given cancer patient survival time can lead to estimating the survival probability with much more accuracy and efficiency. The fact that we found a unique probability distribution for our study of the survival times of patients diagnosed with MM refute the suggestion of assuming exponential distribution (as suggested by Feigl and Zelen ([1965] p. 835) and other authors) or using the non-parametric Kaplan-Meier for most cancer survivorship studies.
We found both the 3p-lognormal survival most often estimates higher survival probability than the KM survival function, given by Table 4. We know that KM is the most commonly used technique for analyzing cancer survivorship data.
Statistically, the parametric technique is said to be more robust and efficient than the non-parametric counterpart. Therefore, our finding of the parametric 3p-lognormal probability distribution is better and of a higher degree of accuracy in estimating the survival probability of the patients diagnosed with MM than the Kaplan-Meier. The KM technique is more often used to compare the difference between the survival probability of the survival times of two or more entities or groups usually based on the log-rank test. However, by obtaining the best parametric probability distribution that characterizes the survival times, we can find the survival function and estimate the survival rate and compare the results of two or more entities with a high degree of accuracy. The only situation where non-parametric becomes highly applicable or recommended is when there is no parametric form (i.e. pdf) for the given data. One key demerit of KM is that it cannot be used to estimate the rate at which patients die or survive with MM (i.e. the hazard function). The hazard function can be easily calculated after finding the parametric probability distribution by dividing the probability density function, pdf by the survival function.

Conclusions
We estimated the survival probability of patients diagnosed with multiple myeloma (MM) using two different methods; the parametric three-parameter lognormal and the non-parametric Kaplan-Meier. We found the parametric method to often give a higher estimate of the survival probability than the non-parametric method. Even though there are times when the non-parametric survival probability estimates are higher, all-important arguments favor the parametric method. The difficulty of the parametric survival analysis is the fundamental intrinsic assumption that the survival times of the population under study follow a specific probability distribution. But if such a limitation can be overcome, then we can obtain a more robust or powerful result from the parametric analysis. We can also estimate the hazard function to assess the rate at which patients die with MM after finding the right parametric distribution. Base on the two different methods utilized for estimating the probability of survival of patients diagnosed with MM, we offer the following essential recommendations. 1) If the only information about patients is the survival (death) time, then estimating the survival probability using the parametric technique will yield more accurate, robust, and efficient results than the commonly used Kaplan-Meier. 2) However, if no unique or well-defined parametric probability distribution is found, then we still recommend the use of Kaplan-Meier technique for the estimation of the survival probability. Though the use of non-parametric Kaplan-Meier survival analysis may, in some cases result in a similar or higher estimate of the survival rate (such as in our case), the parametric analysis remains more powerful, robust and efficient, and hence must be considered first in the analysis of any given cancer survivorship data. The study provides an improved method for estimating the survival probability and analysis of cancer survivorship data, and further enhance the therapeutic/treatment process of the incurable multiple myeloma cancer.