The Cox Proportional Hazard Regression Model Vis-à-Vis ITN-Factor Impact on Mortality Due to Malaria

This study has provided a starting point for defining and working with Cox models in respect of multivariate modeling. In medical researches, there may be situations, where several risk factors potentially affect patient prognosis, howbeit, only one or two might predict patient’s predicament. In seeking to find out which of the risk factors contribute the most to the survival times of patients, there was the need for researchers to adjust the covariates to realize their impact on survival times of patients. Aside the multivariate nature of the covariates, some covariates might be categorical while others might be quantitative. Again, there might be cases where researchers need a model that has the capability of extending survival analysis methods to assessing simultaneously the effect of several risk factors on survival times. This study unveiled the Cox model as a robust technique which could accomplish the aforementioned cases. An investigation meant to evaluate the ITN-factor vis-à-vis its contribution towards death due to Malaria was exemplified with the Cox model. Data were taken from hospitals in Ghana. In doing so, we assessed hospital in-patients who reported cases of malaria (origin state) to time until death or censoring (destination stage) as a result of predictive factors (exposure to the malaria parasites) and some socioeconomic variables. We purposefully used Cox models to quantify the effect of the ITN-factor in the presence of other risk factors to obtain some measures of effect that could describe the relationship between the exposure variable and time until death adjusting for other variables. PH assumption holds for all three covariates. Sex of patient was insignificant to deaths due to malaria. Age of patient and user status were both significant. The magnitude of the coefficient (0.384) of ITN user status depicts its high contribution to the variation in the dependent variable. How to cite this paper: Turkson, A.J., Addor, J.A. and Ayiah-Mensah, F. (2021) The Cox Proportional Hazard Regression Model Vis-à-Vis ITN-Factor Impact on Mortality Due to Malaria. Open Journal of Statistics, 11, 931-962. https://doi.org/10.4236/ojs.2021.116055 Received: September 28, 2021 Accepted: December 3, 2021 Published: December 6, 2021 Copyright © 2021 by author(s) and ScientificResearch Publishing Inc. This work is licensed under the CreativeCommons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access A. J. Turkson et al. DOI: 10.4236/ojs.2021.116055 932 Open Journal of Statistics

their impact on survival times of patients. Aside the multivariate nature of the covariates, some covariates might be categorical while others might be quantitative. Again, there might be cases where researchers need a model that has the capability of extending survival analysis methods to assessing simultaneously the effect of several risk factors on survival times. This study unveiled the Cox model as a robust technique which could accomplish the aforementioned cases. An investigation meant to evaluate the ITN-factor vis-à-vis its contribution towards death due to Malaria was exemplified with the Cox model. Data were taken from hospitals in Ghana. In doing so, we assessed hospital in-patients who reported cases of malaria (origin state) to time until death or censoring (destination stage) as a result of predictive factors (exposure to the malaria parasites) and some socioeconomic variables. We purposefully used Cox models to quantify the effect of the ITN-factor in the presence of other risk factors to obtain some measures of effect that could describe the relationship between the exposure variable and time until death adjusting for other variables. PH assumption holds for all three covariates. Sex of patient was insignificant to deaths due to malaria. Age of patient and user status were both significant. The magnitude of the coefficient (0.384) of ITN user status depicts its high contribution to the variation in the dependent variable.

Introduction
Event history analysis is an omnibus term for the collection of statistical methods that focuses on the timing and occurrence of events. Reference [1] posits that survival analysis techniques model the probability of a change in a dependent variable Y t from an origin state j to a destination state k, as a result of predictive factors, and that the duration of time between states is referred to as event time. Survival analysis models are used to examine the survival and hazard rates for some events of interest which are probabilistic in nature. One of the goals of survival analysis is to obtain some measures of effect that can describe the relationship between a predictor variable of interest and time to failure, after adjusting for the other variables, we have identified in the study and included in the model, this measure of effect is the hazards ratio [2]. Reference [3] has argued that survival analysis examines the effect of changes in the covariates on the duration of time preceding the event as well as the probability that the event will occur.
Standard procedures for survival and event history analysis involve modelling time to death or failure, often as a function of covariates, using either parametric or semiparametric approaches. Various parametric families of models are used in the analysis of lifetime data, including the exponential and the Weibull, with the latter being popular due to its flexibility. The Cox regression model is a cornerstone of modern survival analysis and is widely used in many other fields as well. The model is used to investigate the impact of various explanatory or predictor variables on the outcome or response variable with the view of identifying, salient and crucial variables which have telling effect on the study [4]. In mathematical terms, we can equally say that the Cox proportional hazards model is used to model survival data as a function of covariates. The purpose of the model is to evaluate simultaneously the effect of several factors on survival. In other words, it allows us to examine how specified factors influence the rate of a particular event happening at a particular point in time. This rate is commonly referred as the hazard rate [5]. Predictor variables (or factors) are usually termed covariates in the survival-analysis literature. The Cox model is expressed by the hazard function denoted by h(t). Briefly, the hazard function can be interpreted as the risk of dying at time t. The response variable is the hazard function h(t), which assesses the probability that the event of interest (in this case, death) occurred before time t. The equation models this hazard as an exponential function (exp) of an arbitrary baseline hazard h 0 when all covariates are null, and β is the regression coefficient of the covariate, x.
Though ordinary regression analysis (ORA) could be used to achieve the same purpose, statistician flown on its use due to the problem of incomplete data associated with most prospective studies. ORA cannot take into consideration partial or incomplete information from the entire study group. The Cox's proportional hazards regression model has the ability of taking into consideration partial information from censored data as well as full information from uncensored data and is therefore more appropriate in such situations. The Cox proportional hazard model is a statistical method that finds the cumulative probability of an event, it also accounts for impact of covariates on that probability. The model works for both quantitative predictor variables and for categorical variables. Furthermore, the Cox regression model extends survival analysis methods to assess simultaneously the effect of several risk factors on survival time [6].
In the Cox model, we take cognizance of two types of covariates; those that depend on time, and those which are time-independent. In this paper, we limited our discussion on time-independent covariates. We also looked at some aspects of the Cox proportional hazards regression model. Special emphasis was placed on the following areas: how to develop the model; popularity of the model; hypothesis testing for proportional hazards model; the stratified Cox model; meaning of the proportional hazard (PH) model; failure of the PH model; testing of the proportional hazards model; alternative method for assessing the PH model; hazard ratio; the likelihood ratio test; and graphical approach to the lol-log plot.
In order for researchers to apply the model to life and properties, it is deemed expedient to subject covariates to empirical survival analytic studies. Applying the theories behind Cox models is particularly useful in examining treatment comparisons based on the time to some events while adjusting for the effect of concomitant variables. It is useful for predictions to maintain optimal maintenance policies in engineering, medical and biomedical studies. The Cox model has the advantage of preserving the variable in its original quantitative form, and of using a maximum of information [7] [8]. The paper is organized as follows; it provides: 1) Theoretical framework which underpins the study.
2) Theories by which Cox model could be laid out.
3) Simulation study on the Cox model.

4) Real case empirical studies with apt interpretation of the outcome.
This study has the propensity of supporting enquirers in understanding and interpreting the hazard ratio as a measure of effect that describes the relationship between the predictor variables and time to failure (time to obtaining the event of interest).

Theoretical Framework Underpinning the Study
Reference [9] employed Cox proportional hazard regression as a less parametric alternative to generalized linear model (GLM) and ordinary least squares model (OLS) even when there was no need to correct for censoring. They examined how well the alternative estimators behaved econometrically in terms of bias when the data were skewed to the right. Specifically, they provided evidence on the performance of the Cox model under a variety of data generating mechanisms and compared it to the estimators studied recently in [10]. They noted that the gamma regression model with a log link seemed to be more robust to alternative data generating mechanisms than either OLS on ln(y) or Cox proportional hazards regression. In conclusion they found out that the proportional hazard assumption was an essential requirement to obtaining consistent estimate of the ( ) | E y x using the Cox model. Reference [11] proposed the use of the Cox proportional hazard model (CPHM) for the analysis of early-failure data associated with power cables. They alluded to the fact that the CPHM analyses simultaneously a set of covariates and identifies those which have significant effects on the cable failures. In order to demonstrate the appropriateness of the model, they obtained relevant historical failure data related to medium voltage (MV, rated at 10 kV), distribution cables and high voltage (HV, 110 kV and 220 kV). The transmission cables' data were collected from a regional electricity company in China. It was revealed in the study that the model was more robust than the Weibull distribution, again, it was demonstrated that the method provided could single out a case of poor manufacturing quality with a particular cable joint by using a statistical hypothesis test.
They proposed an approach which could potentially help resolve any legal dispute that may arise between a manufacturer and a network operator. Reference [12]  Reference [16] noted that the Cox proportional-hazards regression model had achieved widespread use in the analysis of time-to-event data with censoring and covariates. They noted that the covariates may change their values over time and therefore discussed the use of such time-dependent covariates. They further noted that the interrelationships between the outcome and variables over time could lead to bias unless the relationships were well understood. They indicated that the form of a time-dependent covariate was much more complex than in Cox models with fixed (non-time-dependent) covariates and that constructing it involves a function of time. In the study [16], child mortality was considered as the dependent variable. Child Mortality was deemed to measure the probability of dying between the age of one and four years (expressed per 1000 live births).
They also considered several important socioeconomic and demographic predictors which included the following: Age of women (15 -19, 20 -24 and 25 -49 years); education of women (illiterate, literate but below primary, primary but below middle, middle but below high school and high school and above); place of residence (rural and urban); child's gender (Female and Male); mass media exposure (no exposure and any exposure); wealth quintile (poorest, poorer, middle, richer and richest); religion (Hindu, Muslim and others); caste (Scheduled Caste (SC), Scheduled Tribe (ST), Other Backward Class (OBC) and others); birth order (1, 2 -3 and 4 or more); birth Interval (less than 2 years and greater than 2 years); parity (1 -2, 3 -4 and ≥5); working status of women (Not working, working at home and working away from home); women empowerment (not empowered, partially empowered and Fully empowered); and region. Reference [17] conducted a study on factors influencing women's waiting time to first birth in Bangladesh, they applied the Cox proportional hazard model. In their study, the event of interest was waiting time to first birth after marriage. The variable could not be obtained directly and therefore they used the difference between the age of the women at first birth and age at first marriage as the waiting time to first birth. Women who were still waiting for their first birth after termination of study were considered to be censored. The event of interest variable was measured in months. The censoring indicator was equal to 1 if the observation was found to have had their first child and 0 if they did not have any child. Some demographic and socio-economic variables were selected as explanatory variables-few of these were: Current working status; age of woman, region of descent, type of residence; religious affiliation; educational level; household head; media influence; ideal number of children; wealth quintile; partners level of education; and occupation of partner.

Developing the Cox Proportional Hazards Regression
The difficulties one encounters with parametric models can be resolved with the proportional hazard's models. For two individuals who differ only in the relevant membership (e.g., treatment verses control) their predicted log-hazard will differ additively by the relevant parameter estimate, which is to say, their predicted hazard rate will differ by e β , i.e., multiplicatively by the anti-log of the estimate. Thus, the estimate can be considered a hazard ratio, that is, the ratio between the predicted hazard for a member of one group and that for a member of the other group, holding everything else constant. For a continuous explanatory variable, the same interpretation applies to a unit difference. Other hazard rate models have different formulations and the interpretation of the parameter estimates differs accordingly.
Assuming that the value of the covariate x, is fixed and does not change over time, the regression model will be (3) Two points worth knowing here are that 1) The hazard function does not depend on time; its value is determined by the covariate x and the unknown parameters 0 β and 1 β 2) The hazard function and systematic component in the regression model are inversely related.
The fact that the hazard does not depend on time means that the risk of failure is the same no matter how long the subject is followed. Models that are used to describe survival times in a comparative sense are often called semi-parametric regression models. Typically, when we want to compare the survival experience of sub-groups, we need to specify the hazard function as a function of time and covariates.
The hazard function in Equation (4) The hazard ratio HR depends only on the function ( ) , r x β . If the ratio in Equation (5) is easily interpreted then the baseline function which is a function of time is of little importance.
Reference [18] was the first to propose that in the model of Equation With this parameterization the hazard function is now denoted by and the Hazard Ratio More generally, we can write Equation (6) h t X is the hazard at time t for a subject with a set of predictors • it is a product of a function in t and a function in X; • X is time independent; • the baseline hazard is an unspecified function, making it a semi-parametric model Equation (7) can be interpreted as "relative risk". The coefficients 1 2 , , , k , , e 1.5 HR t x x = = which means that males are failing one and a half times that of females. A hazard ratio of one (1) means that there is no effect. One (1) is the null value for the exposure-outcome relationship. The term proportional hazards refer to the fact that the hazard functions are multiplicatively related, that is to say, their ratios are constant over survival time. In assessing the validity of the model, this assumption is important. One way to specify the distribution of survival time is through the hazard function. If we use the relationship between the survival function and the hazard function H t x β is the cumulative hazard function at time t for a subject with covariate x.
One important decision in survival analysis is how to properly model the con- ditional hazard rate of failure given certain predictor variables (covariates); this is due to the fact that statisticians are interested in finding out whether the predictor variables are correlated or uncorrelated with the survival or failure times. The model provides a technique for exploring the association of predictor variables with failure times and survival distributions; it is also used for studying the effect of a primary covariate or a predictor of interest while adjusting for other variables. This model assumes that given an m-dimensional vector of covariates Z, the conditional hazard rate given by, is a function of the independent predictor variables, The function Which is an Independent Identical Distribution sample from the population ( ) , , . If the random variable T and C are positive and continuous then , is the cumulative baseline hazard's function. This function allows one to estimate the function ℜ using regression techniques if ( ) 0 h X is known. The likelihood function can also be derived.
When 0 δ = , all we know is the survival time i i T C ≥ and the probability of getting this is From the proportional hazards model

Hypothesis Testing for Proportional Hazard Models
One way of finding out if the predictor variables really contribute to the risk or hazard function (after fitting the Cox model) is to conduct a test of hypothesis. There are two tests that will be very useful in testing this hypothesis. They are the Wald and the likelihood ratio tests: For models with multiple parameters, it is convenient to use the Wald test for one parameter at a time. When fitting different nested models, the likelihood ratio test is most convenient.
For a test of a single parameter being equal to 0, the Wald test statistic is: Large values of 2 Z support the alternative hypothesis. For multivariate models, a version of the Wald test exists, which comes from a 2 χ distribution with more degrees of freedom, but we will rarely need this. The likelihood ratio test statistic for the hypothesis that a single parameter is equal to zero is For tests of multiple parameters being equal to zero, the degrees of freedom increase as explained below The hypothesis in Equation (17) x x x will be excluded from the model, therefore the hypothesis becomes Equation (18) means that none of the predictor variables identified contributed to the hazard or risk of death. The hypothesis in (17) is rejected if; The hypothesis: This implies that the model with or without the predictor variables gives the same results.
The hypothesis in Equation (19) is rejected if;

Stratified Cox (SC) Procedure
The stratified Cox proportional hazard model allows the underlying hazard function to vary across the strata variables. The procedure demands that the Cox proportional hazards (PH) model is modified to make provision for control by stratifying a variable that fails to satisfy the PH assumption. Variables that satisfy the PH assumption are included in the model, whereas the variables that fail to satisfy the PH assumption are stratified by their non-inclusion in the model. Let's assume that 1 2 , , , K Z Z Z do not satisfy PH, and 1 2 , , , P X X X do satisfy the PH assumption. We will define a new variable * Z from the Z's which will be used for the stratification. If race and sex do not satisfy the PH assumption then we can form combinations from the categories as follows From Table 1, we note therefore that * Z has 6 k = categories or strata.
The hazard function for the stratified Cox model is given below are the same for each stratum, estimates of the hazard's ratio will be the same. This feature of the SC model is referred to as the no-interaction assumption. The no interaction assumption implies that the hazard ratios are the same for each stratum.
If the only predictor that failed to satisfy the PH assumption is sex and the covariates are Age(X 1 ) and user status (X 2 ) then the SC model becomes , for females (22) In the models above age and user status are in the model whereas sex is not in the model Sex is therefore controlled by stratification. Since the age and user status variables are included in the model, we can estimate the effect of each variable adjusted for the effect of the other variable and sex. The estimated hazard ratio for the effect of age adjusting for user status and sex is given by 1 e β , and that for 'user status' adjusting for age and sex is given by We use the stratified Cox model to control for the sex variable which does not satisfy the PH assumption. The implication here is that the sex variable is being adjusted for stratification, we have also included the age and preventive measure variable (which do satisfy the PH assumption) into the model. In other words, the age and preventive measure variable have been adjusted by their inclusion into the model. In the model we can infer that the hazard ratio for the effect of the preventive measure variable adjusted for age and sex is given by the value 1.452, this value can be interpreted to mean that the exposed group (that is the group that do not use the insecticide treated net) has 1.5 times the hazard of death as compared to the less exposed group (group that use ITN) reference [19].

The Meaning of the PH Assumption
The PH assumption requires the hazard ratio (HR), defined as the ratio of the predicted hazard function under two different values of a predictor variable to be constant over time. In other words, the hazard function for one individual should be proportional to the hazard function for any other individual. Moreover, the proportionality constant should be independent of time, that is to say, at where C is a constant. C may depend on the explanatory variables but not on time. Graphically, the hazards for different individuals on the same graph should not cross paths. The rule is that if the hazards cross paths, then the PH assumption is violated, resulting in the inappropriateness of the use of the Cox PH model. It should be noted that a bit of crossing at early time points may be a product of noise in the survival estimates and may not constitute a violation of the proportional hazard's assumption.
There are a variety of techniques, both graphical and test-based, for assessing the validity of the proportional hazard's assumption. One technique is to simply plot Kaplan-Meier survival curves to compare two groups with no covariates. If the curves cross each other, the proportional hazards assumption is violated. If on the other hand the curves do not cross the path of each other, then the PH assumption is satisfied. An important caveat to this approach must be kept in mind for small studies. There may be a large amount of error associated with the estimation of survival curves for studies with a small sample size; therefore, the curves may cross even when the proportional hazards assumption is met. The complementary log-log plot is a more robust test that plots the logarithm of the negative logarithm of the estimated survivor function against the logarithm of survival time. If the hazards are proportional across groups, this plot will yield parallel curves. Another common method for testing the proportional hazards assumption is to include a time interaction term to determine if the HR changes over time, since time is often the culprit for non-proportionality of the hazards. If the group time interaction term is not zero, it is evidence against proportional hazards.

Failure of the Proportional Hazards Assumption
If the PH assumption does not hold, there are options for improving the nonproportionality in the model. Other new covariates can be included in the model, again, non-linear terms for existing covariates, or interactions among covariates can be incorporated. Alternatively, the model could be stratified in the analysis on one or more variables. This approach will lead to estimates of a model in which the baseline hazard is allowed to be different within each stratum, but the covariates effects are equal across strata. Other options include dividing time into categories and using indicator variables to allow hazard ratios to vary across time, and changing the analysis time variable (e.g., from elapsed time to age or vice versa).

Alternative Method for Assessing the PH Assumption
The goodness of fit approach is appealing because it provides a test statistics and p-value for assessing the PH assumption for a given predictor of interest.
This approach was originally proposed by Schoenfeld but has been modified in [20] and is based on the residuals defined by Schoenfeld now known as the Schoenfeld residuals. For each predictor in the model Schoenfeld residuals are defined for every subject who has an event [19]. The steps for running the test are based on the null hypothesis that; 'The correlation between the Schoenfeld residuals and the ranked failure time is zero, that is 0 0 H ρ = = ' The outline follows below: • Run a Cox PH model and obtain Schoenfeld residual for each predictor; • Create a variable that ranks the order of failure. The subject who had the first event gets a value of 1; the next gets a value of 2 and so on; • For persons censored, the value of the residual is set to missing; and • Test the correlation between the variables created in the first and second steps. If the null hypothesis is rejected, we will conclude that the PH assumption is violated, otherwise, it is not violated. A hazard ratio greater than one (1) indicates that the covariate is positively associated with the probability of the event and negatively associated with the length of survival time.

Hazard Ratio
In summary, • HR = 1.0, implies equal risk rates. (No effect, differences are likely due to chance); • HR > 1.0, implies increased risk rate in control group (increase in Hazard); and • HR < 1.0, implies decreased risk rate in control group (reduction in the hazard).
The computation of the hazard ratio assumes that the ratio is consistent over time; therefore, if the survival curves cross, the hazard ratio statistic should be ignored. The term proportional hazards refer to the fact that the hazard functions are multiplicatively related, that is to say their ratios are constant over survival time [19]. In assessing the validity of the model, this assumption is important.
While a hazard ratio (HR) and relative risk (RR) are similar in some aspects, there is a slight difference between the two. For instance, in a clinical trial, a researcher might investigate the Hazard rates and Relative risk for two types of drug users: user X and user Y. Assuming that both the hazard ratios (HR) and relative risk (RR) were 3.0, then the interpretation of the results is as follows: • The relative risk (RR) tells us that the risk of death is three times higher with user X than with user Y over the entire period of the study. RR does not care about the timing of the event.
• The hazard ratio (HR) tells us that the risk of death is three times higher with user X than with user Y at any particular point in time. HR cares about the total number of events and also the timing of the events. The distinguishing feature is the timing or time period under consideration. In evaluating Hazard ratios, it is imperative that we support our results with other measures like the median survival time, overall survival, or time to progression.

The Likelihood Ratio Test
To help choose between two alternatives; random verses systematic variation based on the observed difference between two log-likelihood values generated from two statistical models, the application of a theorem from theoretical statistics has been proposed [21]. The theorem states that the difference between two log-likelihood values multiplied by −2 has an approximate chi-square distribution when three conditions hold. The first condition is that the two models generating the log-likelihood values must be calculated from exactly the same data. The second is that the compared models must be nested (That is, one model is a special case of the other). The third condition is that the two log-likelihoods must differ only because of random variation. When the first two conditions apply, a test statistic with a chi-square distribution produces an assessment of the plausibility of the third condition.
The likelihood ratio test can be used to perform several tests. For instance, it could be used to test the significance of an interaction term in a model and the significance of a covariate in a model after adjusting for the other covariates. To test the significance of a covariate like usage of ITN, we need to compute the difference between the log likelihood statistics of the reduced model which does not contain the covariate and the likelihood statistics of the full model containing the covariate. The formula is given below; where, R denotes the reduced model and F the full model.
It has been indicated in [19] that the LR statistics is a chi-square statistic 2 χ with p degrees of freedom where p is the number of covariates or predictors being assessed (in this example p = 1) under the null hypothesis that the covariate is not significant.

Graphical Approach to Log-Log Plot
This plot is simply a transformation of an estimated survival curve that results from taking the natural logarithm of an estimated survival probability twice, that is ( ) Taking the log of the expression twice we shall obtain Alternatively, if the predictor variables are time independent, then the PH model is given by We see from Equations (28) and (29) that the baseline hazards are constant in both cases, they do not contribute to the predictions. If the PH assumption is satisfied for this data, then the graphs of the time functions Equations (28) and (29) will be approximately parallel. The graph of the differences between Equations (28) and (29) does not involve t. The formula says that if we use the Cox PH model and plot the estimated log-log survival curves for two subjects on the same graph, the two graphs would be approximately parallel and the distance between the two curves is the linear expression involving the differences in predictor values which does not involve t. This parallelism of the log-log survival plots for the Cox PH model provides us with a graphical approach for assessing the PH assumption [22].

Target Population
The target population was all the residents of Sekondi-Takoradi district in Ghana. Data on malaria cases was obtained from three hospitals in the district, using observational studies, interviews and records from the records department. {Malaria accounts for about 1 million deaths in Africa annually and has slowed economic growth in African countries by up to 1.3% per year. Insecticide-treated nets (ITNs) undergo a series of tests to obtain listing by World Health Organization (WHO) prequalification. These tests characterize the bio-efficacy, physical and chemical properties of the ITN. ITN procurers assume that product specifications relate to product performance [23] [24]. The observational studies were carried out on patients who had been diagnosed of severe malaria and were on admission at the hospital. The study spanned over a 4-month period beginning from 1 September 2009 to 31 December 2009. The patients were enrolled into the study at different times as and when they were diagnosed and admitted. Within the study period patients who were discharged were treated as censored, those who died from a different ailment besides malaria were treated as censored, those who died from malaria were treated as patients who obtained the event of interest. At the end of the study period, all patients who were still on admission were considered as censored. In all a total of 1793 patients were enrolled into the study. The patients were made up of males, females, young and old, rich and poor, those from the country side and those from the cities. For each patient, data on the following were obtained, age on admission, gender, level of exposure indicative by type of mosquito net used at home, date of ad-mission, date of discharge/death, cause of death and censoring status. The assumptions made about the patients were that all of them received the same treatment once they were on admission; a further assumption was that those on admission were considered as first-time in-patients who had had no previous admission records.

Data Analysis
The extracted data for each person was coded as follows: Gender: Male = 1; Female = 2.
Type of preventive measure used: Insecticide treated net = 0; mosquito nets plus others = 1.
Censoring status: if patient died it is coded as = 1; If patient was discharged, died from a different sickness or was alive at the end of study, the code was = 0 Difference between date of discharge/death and date of admission = survival time in days. Age was seen as a continuous quantitative variable. The coded variables were keyed into SPSS version 20 and analyzed using survival analysis models. The survival experiences of the two exposure groups were compared and contrasted using the Kaplan-Meier survival curves. The Cox proportional hazards model was used firstly to assess the risk of malaria-death for the two exposure groups; secondly, it was used to explore the relation between the baseline risk factor (malaria-death) and the predictor variable of interest (level of exposure), after adjusting for possible interaction effect of sex. The log rank test was used to test whether Kaplan-Meier curves for the two exposure groups in the entire population were statistically equivalent. The likelihood ratio test was used to ascertain the significance of the preventive measure variable (ITNs) that was used to lessen the exposure level to the mosquito parasites. The log-minus-log method was used to fit the biomedical data to assess whether the exposure data satisfies the proportional hazards assumption (Figure 1). Table 2 gives a pictorial view of the survival analysis scheme from the origin state, that is, arrival on admission to the destination point, that is, end of study period.

Simulation
Simulation studies present an important statistical tool to investigate the performance, properties and adequacy of statistical models in pre-specified situations. One of the most important statistical models in medical research is the Cox proportional hazards model. In this paper, techniques to generate survival times for simulation studies regarding Cox proportional hazards models are presented. We simulated the data set called "anderson.dat", which consisted of survival times on 42 leukemia patients [5]. Table 2 represents a truncation of "anderson.dat". the simulation studies were performed using the first four subjects. Figure 2 gives the probability value (p-value) for the three covariates. The p-value for logwbc (0.00) is less than 0.005 and the hazard ratio (HR) is 5.40 indicating a strong relationship between logwbc value and increase risk of relapse. Holding the other covariates constant, a higher value of logwbc is associated with poor survival. Here, a person with higher logwbc has a higher risk of death.
The p-value for treatment status (Rx) = 0.002) which is less than 0.005 and HR is 4.64 indicating a strong relationship between Rx value and increase risk of relapse. Holding other covariates constant, a higher value of Rx is associated with poor survival. A person with higher Rx value has a higher risk of death. The p-value for sex (0.42) is greater than 0.005 and the HR for sex is 1.43. Figure 3 displays the curves of the survival probability for the first 4 persons in our dataset ( Table 2). We notice that the first and second persons (person-0 and person-1) both have a high survival chance with their curve lying above the other carves (two curves walking the same path). The third person (Person-2) has the lowest   (Table 2). We notice that person-0 and person-1 both have a high survival chance (two curves walking the same path). Person-2 has the lowest survival chances. Person 3 has a high logwbc value (2.53).   survival chances. The fourth person (person 3) has a high logwbc value (2.53) ( Table 3). Figure 4 resents the graph of the median conditional time to eventusing Kaplan Meier. As time passed, the median survival time fluctuates (decreases, increases, remained stable for a while, decreases sharply, increases again and finally decreases). Table 4 shows the results of the Cox model simulated values based on the output variables (Table 3) and the data set (Table 2), we note that the logwbc and Rx variables were significant but that of age was not significant (Tables 5-8).

Hypothesis One
H o : The survival curves of the exposed and the less exposed groups are equivalent.  Table 5. Shows the output after fitting data using Kaplan-Meier-Fitter with duration variable, or time "Surv" and event observed as "relapse". Fitted with 42 total observations with12 right-censored observations. Footnotes: #at_risk-it stores the number of current patients; at-risk = current patient at risk + entrance removed; #event_at-It stores the value of the timeline for the dataset (i.e., time the patient was observed in the experiment or time the experiment was conducted); # Removed = observed + censored; # Censored = Persons that did not relapse; and #Observed = Persons that relapsed (died).   Table 9. Summary of Test results for testing the equality of Survival curves for exposed and less exposed groups. From the test results presented as Table 9, the p-value of the two exposure groups was less than 0.05, that is (p-value = 0.002 < 0.05). We therefore reject H o (Hypothesis 3.1.1) and conclude that the survival curves of the exposed group and the less exposed group were significantly different.

Hypothesis Two
H O : The method of protection adopted to prevent exposure to parasite was not significant. conclude that the method of prevention was significant.

Hypothesis Three
H O : The survival experiences of the exposed and the less exposed groups is not significant after stratifying by sex.  . We note from Table 5 that the p-value of the preventive measure variable (p-value = 0.006 < 0.05) was significant. We therefore reject the null hypothesis (Hypothesis 3.1.3) and conclude that the preventive measure variable is significant after stratifying by sex.  Table 9 (the results provided by the computer) the correlation between ranked failure time and Schoenfeld residual was significant at both the 0.05 and 0.01 levels of significance, thus, we have every evidence to reject hypothesis 3.1.4. If we consider the last row (rank of duration significant two-tailed), we note that the null hypothesis was rejected for the sex variable (p = 0.04) but not rejected for preventive measure (p = 0.21) and age variable (p = 0.85).

Discussions
We see from ( Table 6) that out of the 1793 patients sampled 405 representing 22.6% were using insecticide treated nets while the majority (77.4%) was using other types of nets like window netting and ordinary treated nets. It was also established that of those who were using ITN 16% died within the four months study period while 84% survived, again out of the non-users of ITN 23% died while 77% survived within the same study period. The ratio of death of male to female was ‫.1׃2.1‬ This ratio indicates that death due to malaria for the period of observation was not gender related.
The plot of ( Figure 5) gives a graphical picture of the survival curves of the two groups of users of ITN. We notice from the graph that the survival experiences at the first few days of the study appeared to be the same but thereafter the differences showed up clearly, we see also that the curve for the users of ITN consistently lies above that of the non-users, this characteristic shows that users  The curve for users of ITN lies above that of the non-users. We see also that for those who did not use ITN there were many steps within the curve with each step representing death.
of ITN have a better survival prognosis than non-users. The difference further means that ITN was effective at all points during the observational period. From the graph we could also estimate the median survival times for the two classes of users. This is done by locating 0.5 on the y-axis and proceeding horizontally till it meets the curves, once the horizontal line meets the curve, we draw a vertical line from the point of intersection of the curve and the horizontal line to meet the x-axis. From the graph the median survival time of the non-users of ITN was approximately 10 days while that of the users of ITN was close to forty (40) days. The median value further confirms our claim that users of ITN have better survival prognosis than non-users. The failure potential of the users of ITN and non-users is presented as Figure  6. It is worth mentioning that while the survival function gives the probability of surviving, the hazard function or rate gives the risk of failing. The higher the hazard rates the worst the impact on survival. The curves in the figure depicts that non-users of ITN are at a higher risk of malaria deaths than users From Figure 7, we could infer that the survival experiences of males and females were approximately the same, this implies that sex do not contribute significantly to death due to malaria. The difference between two plots is given by , and what the expression is saying is that, if we use a Cox PH model and plot the estimated log-log survival curves for two groups on the same  graph the curves will be approximately parallel, and the distance between them is the linear expression involving the difference in the predictor values (method of protection used for preventing mosquito bites) which does not involve t. The summary of the four possible results from the examination of the log negative log Kaplan-Meier survival estimates plotted against the log of time as shown in Figure 8 for the two levels of protection against exposure to the mosquito parasite are given below.
• Parallel and straight lines imply that the Weibul model, (WM), Accelerated Failure time (AFT) and the Proportional hazard (PH) assumption hold. • Parallel but not straight lines imply that the PH assumption holds but neither the WM nor AFT model holds.
• Non-parallel and non-straight lines suggest that PH, AFT and WM do not hold • Non-parallel but straight lines imply that the WM holds but neither the PH nor the AFT hold. Examining Figure 8 critically, we notice that the plots for both the less exposed and the exposed as indicated by the use of ITN or otherwise are reasonably straight suggesting that the Weibul assumption reasonably holds. We notice again that the two curves are approximately parallel (their gradients ρ are approximately the same) implying that the PH and the AFT assumptions hold. This parallelism of the log-log Kaplan Meier survival curves for the Cox PH Open Journal of Statistics provides us with a graphical approach for assessing the PH assumption. We could infer from the parallelism of the plots that once the plots are parallel, under no circumstance will the survival experiences of the two groups of users be the same.
We used the stratified Cox model (Table 8) to control for the sex variable which does not satisfy the PH assumption. The implication here is that the sex variable is being adjusted for stratification, we have also included the age and preventive measure variable (which do satisfy the PH assumption) into the model, in other words the age and preventive measure variables have been adjusted by their inclusion into the model. In the model we can infer that the hazard ratio for the effect of the preventive measure variable adjusted for age and sex is given by the value 1.452., this value can be interpreted to mean that the exposed group (that is the group that do not use the insecticide treated net as a means of preventing exposure to the mosquito parasite) has 1.5 times the hazard of death through malaria as the less exposed group (group that use ITN as a means of preventing exposure to the malaria parasite) The variables in the stratified Cox model (Table 8) provides us with useful information to test whether there is any difference in the population survival curves for the two classes of users of ITN, after adjusting for sex (since sex did not contribute significantly to the risk of malaria death). The null hypothesis for this test was that there was no difference in the survival curves of the users of ITN and non-users. The p-value of the log-rank test (0.002 < 0.05) was highly significant, implying that there was a statistically significant difference between the population survival curves after adjusting for sex. This result states inter alia that if the whole population elements were included in this study the survival experiences of the users of ITN and non-users would have been different, this further means that the predictor variable under consideration does contribute significantly to the death due to malaria. The Cox proportional hazards (PH) model is presented in Table 7, in this model the PH assumption was assumed to hold for all three covariates. The model used all 1793 patients observed in the study. The output variable was time in days until a patient die. The method of estimation used to obtain the coefficients was the maximum likelihood estimation (MLE). A p-value of 0.374 (from column 6) was obtained for the sex variable. This value indicates that the sex variable was not significant, that is to say, the sex of the patient plays no significant role in deaths due to malaria, however the p-values (0.000) of age of the patient and the p-value (0.005) of user status of ITN (Type of preventive measure used) were both highly significant telling us that the risk of malaria death was dependent on one's age and the method of prevention adopted (in this case users and non -users of ITN). From column 2, the magnitude of the coefficient (0.384) of ITN user status depicts that user status contributes largely to the variation in the dependent variable (that is death due to malaria), while the contributions from the remaining covariates age and sex were insignificant. The hazard ratio denoted by ( ) exp β in the Table 4 indicates that the ratio of the users of ITN and non-users was 1.468, which translates into saying that the non-users if ITN were 1.5 times at risk of malaria death than users of ITN. For age and sex variables the hazard ratios do not give any useful information. It should be recalled that a hazard ratio of one means that there was no effect.
The Cox adjusted log-log plots ( Figure 7) were fitted using the mean values of age and sex and were used to evaluate the PH assumption for the preventive measure. From this figure, we noticed that the two graphs were approximately parallel which translates into saying that the survival experiences of the users of ITN and non-users can in no way be the same. Table 12 gives us the results for the Schoenfeld statistical test, in this test, the null hypothesis H O is that the PH assumption was not violated. The p-values for testing whether the correlation was zero between the ranked survival time and the covariates (Schoenfeld residuals) are the p-values for the statistical test. From the computer output (Table   12), the following results were obtained: In case A, the null hypothesis was rejected, thus we conclude that for the sex A. J. Turkson et al. variable the PH assumption was violated, which also means that in determining death due to malaria the sex variable was not a risk factor. In the cases B and C, we do not have enough evidence to reject the null hypothesis, so we conclude that the PH assumption was not violated for the age and preventive measure variables, by implication we can assert that in determining the risk of malaria deaths these variables (age and preventive measure) might play significant roles.
From the computer outputs labeled as Table 5 and Table 6 we could assess the significance of the preventive measure variable using the likelihood ratio test which was given as The LR statistics is a chi square statistic 2 χ with one degree of freedom (because we are assessing only one predictor-usage of ITN) under the null hypothesis that the predictor is not significant. From a web based statistical calculator, the chi square value of 8.526 translates to a p-value of 0.0035 < 0.05, thus we have enough evidence to reject the null hypothesis, and conclude that the predictor under investigation (type of preventive measure used by patients) was significant and therefore contributes significantly to the risk of malaria death.

Conclusions
At the onset, we sought to provide theoretical framework underpinning the Cox proportional hazards model; outline theories on which the Cox model could be laid out; do some simulation study on the Cox model and provide a real case empirical studies with apt interpretation of the outcome. In clinical investigations and medical researches, there may be many situations, where several known quantities potentially affect patient prognosis. One or two of these competing risk factors might predict one's predicament more than others, in seeking to find out which of the risk factors contribute or have the highest impact on the survival time of a patient, there is the need for researchers to adjust the covariates to realize the impact of each of them on the survival times of the patients. Aside the multivariate nature of the covariates, some covariates might be categorical while others might be quantitative. Again, there might be cases where we need a model that has the capability of extending survival analysis methods to assessing simultaneously the effect of several risk factors on survival time. A method of analysis that can accommodate all the enumerated situations is none other than the Cox proportional Hazards model. The discovery of a diagnostic key assessment indicator to diseases such as malaria has been on the ascendancy. Most of these methods focus on classification problems, that is, adopting a model that discriminates patients into distinct clinical groups. Few papers have been published on approaches that predict a patient's event risk or hazard of death due to a predic-tive risk factor. This study has effectively integrated data into multivariable Cox proportional hazard models for risk prediction in malaria. Subsequently, it is insightful, besides the main objective of the study to say that: • All the three models, Weibull, accelerated failure time, and the Proportional hazards assumptions were satisfied; • The method of protection adopted and age satisfied the proportional hazards assumption but sex did not; • The hazard ratios of the exposed group were 1.5 times the hazards of the less exposed group; and • Sex of residents did not contribute to the risk of malaria death, but the method of protection and age contributed towards the risk of malaria death.