Artificial Neural Network Model for Predicting Lung Cancer Survival

The object of our present study is to develop a piecewise constant hazard model by using an Artificial Neural Network (ANN) to capture the complex shapes of the hazard functions, which cannot be achieved with conventional survival analysis models like Cox proportional hazard. We propose a more convenient approach to the PEANN created by Fornili et al. to handle a large amount of data. In particular, it provides much better prediction accuracies over both the Poisson regression and generalized estimating equations. This has been demonstrated with lung cancer patient data taken from the Surveillance, Epidemiology and End Results (SEER) program. The quality of the proposed model is evaluated by using several error measurement criteria.


Introduction
Precise prediction of the survival and the hazard has been a challenging task through-out the past years.Research scientists have used parametric methods quite often to serve this purpose.However, they impose certain distributional assumptions on the hazard functions [1].Cox proportional hazard [2] model is an another well-known approach which has been used extensively in the survival analysis.Though this model allows flexible modeling of the hazard with unspecified baseline hazard function, the assumption of time-independence of the hazard ratio may not always be correct [3].Use of this model without verification of those assumptions can lead to misleading results.A more intuitive approach is to assume the hazard ratio to be independent of time, just for smaller periods, partitioning the whole time period into several intervals and introducing the piecewise constant hazard model.This has been advocated as a flexible and parsimonious tool in the literature [4] [5] [6] [7] and is generally useful for interpreting cancer survival and to facilitate the treatments and diagnoses [8].There exist several other techniques for flexible modeling of the hazard function.
For example, Boracchi et al. [9] have developed a model with cubic splines, whereas Diehl et al. [10] have developed a nonparametric model based on kernel density estimation approach.
Artificial intelligence neural networks (ANNs) have been extremely popular in almost every field, including computer science, engineering and in the biomedical field among others.They have the strength of making predictions based on both individual attributable variables and possible complex interactions among them.In addition to that, ANNs have the capability of handling nonlinear functions and non-additive effects.Moreover, they are free of any statistical assumptions.Thus, ANN based survival analysis models serve as efficient alternatives to the conventional survival analysis models with enhanced predictive power.One of the earliest work in survival analysis with ANN was introduced by Faraggi and Simon [11] where they have used ANN as a basis for a nonlinear proportional hazard model.Another work, Biganzoli et al. [12] have used ANN to predict the smoothed discrete hazards as conditional probabilities of failure.Ravdin and Clark [13] have shown that ANN can be used to predict the patient outcome with censored survival data including time as a covariate.ANN has also been used in modeling cause-specific hazards [14].Modeling of the piecewise exponential model (piecewise constant hazard) using ANN (PEANN) is proposed by Fornili et al. [15].This method accommodates a greater flexibility in modeling the complex hazard functions.However, when using this model for a large amount of data, the analysis becomes difficult or even impossible due to the high data redundancy involved with the modeling.
In the present study, we have modified the PEANN model by combining it with another ANN model introduced by Mani et al. [16] to develop a more efficient model.As we mentioned earlier, the proposed model has the capability of handling a large amount of data.Importantly, it improves the prediction accuracies.This has been demonstrated by using lung cancer patients' data, and their hazards were predicted in the presence of competing risks.A comprehensive evaluation of the proposed model is conducted by using several error measurements including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Percentage Error (MPE) and Root Square Error (RSE).We compare our model results with the Generalized Estimating Equations (GEE).The results of the proposed model are much better than the competing models.This paper is structured as follows.In Section 2, we introduce the new ANN system and related theory along with other models which we used for comparison.Following to that, we present our results.The final section discusses the implication and limitations.

The Piecewise Constant Hazard Model
Let T be the survival or the follow-up time for subjects 1, 2, , i N =  where T = min {Survival Time, Censoring Time}, and X be the covariates.We consider different types of competing risks, R, which causes the subject to observe the same event of interest [17].Equation (1) defines the hazard function for the r th risk, ( ) Then the corresponding survival function and the probability density function can be obtained by Equation (2) and Equation (3) as given below.
( ) ( ) and where ( ) ( ) ., , , , X X even without assuming independence among the competing risks.Thus, for independent observations, the likelihood function L can be written by Equation (4) assuming non-informative censoring, ( ., , , where i δ is equal to 0 if the subject i is censored and 1 otherwise.Under the piecewise constant hazard model, the follow up time T is divided into several disjoint time intervals, 0 1 , , , j a a a  , where 0 0 a = and J a → ∞ and the hazard function for the r th risk is assumed to be constant in the j th time period Hence, we have,

( ) ( )
., , ., , where ( ) ( ) ., , , , X X for each subject.Then, the modified likelihood function can be written as in Equation ( 5), ( ( ) where is the last interval that the subject i is observed and ij τ is the corres- ponding exposure time which is defined by The kernel given in Equation ( 5) corresponds to the likelihood of the Poisson random variable ij δ with mean ( ) ., , Applying the logarithm on both sides, we obtain It has been shown that, ( ) ., , i j λ X in Equation ( 6) can be modeled with a Poisson log linear model of the form as given in [18] [19].However, this model is not effective in handling large amount of subjects over a longer period.Sometimes, the analysis even becomes impossible, due to the vast amount of ij δ observations that is created.
An alternative method is to group the exposure times and the similar ij δ values for each interval j and then use the Poisson regression.Nevertheless, overdispersion might be a problem with this kind of a Poisson model due to the correlated nature of the data.As a possible solution, we can use generalized estimating equations, which is an extension of the generalized linear model [20].

The Proposed ANN Model
In this section, we introduce an efficient method of modeling the hazard function with artificial neural networks.ANNs allow flexible modeling of the hazard function without any probability distributional assumptions.Moreover, it captures the nonlinear effects of the risk factors.
Preceding our model, Fornili et al. [15] have used ANN to model the

( )
., , i j λ X in Equation ( 6) where they referred it as PEANN.However, this model uses the same data structure as in the Poisson log linear model, and hence, not very effective due to the high data redundancy.As a solution, we introduce a new ANN model with a different network architecture.The new ANN model has several output nodes; each corresponds to a different time interval.This structure is similar to the ANN model used by Mani et al. [16].The proposed model is a competent alternative to the PEANN model, especially when we need to deal with large number of subjects or/and longer follow up periods.

Data Preprocessing
Prior to using the proposed ANN model, data need to be preprocessed.This process can be explained using a simple example.Consider three subjects, called A, B and C who have been observed for J number of years.Suppose we have information about their risk factors 1 X and 2 X , the risk type, their survival time and whether they were censored or not, as given in Table 1.We have considered two different risk types, 1 R or 2 R where each subject can decease due to one of them.The "censor" variable indicates whether a subject has lost follow up somewhere during the study period or not.Hence, for all deceased subjects during the study period, it is set to zero.As can be seen, subject A and B were deceased due to risk types, 1 R and 2 R after 3 and 4 years respectively.
According to Table 1, subject C has lost follow-up after 2 years.
In order to use the new ANN model, this information needs to be preprocessed as given in

Survival Time Risk Type Censor
...
where j n , is the total number of subjects alive at the beginning of j th time interval and, rj d is the number of deceased subjects in the j th time interval due to the r th risk.The ratio, rj j d n , is the Kaplan-Meier [21] hazard probabilities of r th risk for the period j.In summary, if the subject is alive, then ( ) . With this approach, we can significantly reduce the data redundancy.

Network Training
In developing the proposed ANN model, we used the hyperbolic tangent and the exponential activation functions in the hidden and the output layers.The proposed ANN structure is represented in Figure 1.The network output, ( ) , y j r X , gives the hazard for each time interval j, as in Equation ( 7)  During the training, we minimized the regularized canonical error function given by Equation ( 8), where α is the non-negative weight decay penalty term.
By adding a weight decay term, we expect to minimize the network overfitting, [22].As per [23], we trained several ANN models with different weight decay values.
We used a k-fold cross validation technique to find the optimal number of hidden nodes in each network.When using a k-fold cross validation technique, the training dataset is divided into k folds, where 1 k − folds are used to train the model while the remaining set is used for the validation.This process is repeated k times until each fold is used for validation.Other model selection criteria like Akaike Information Criterion and Bayesian Information Criterion are not suitable for model selection as the error function, E is not a linear function of the network weights.The optimal networks are selected based on the minimum average validation error, and each of them is used to make the hazard rate predictions in the testing data set.Scale conjugate algorithm is used for weight optimization.Then, the corresponding survival probabilities are obtained using Equation (9).We evaluated the models using several error measurements calculated based on the predicted median survival time and the actual survival time of the non-censored subjects in the testing data.

The Lung Cancer Data
The data for our study is selected from the Surveillance, Epidemiology and End Results (SEER) program [24], and it contains details of 38,262 white lung cancer patients data who have been diagnosed from 2004 to 2009.Among these, 23,332 subjects were deceased due to lung cancer and 4652 were deceased due to some other causes.The rest were considered as censored due to missing information or lost in the follow-up.
In our analysis, four risk factors were used: age at diagnosis, tumor size, histology and the stage of the cancer.As can be seen from Table 3, a higher amount of patients were between the ages of 65 -75 and most of them had distant metastasis.
The majority of the patients were diagnosed with adeno or squamous cell carcinoma.
The overall median follow-up time for males was 1.33 years and 2 years for females, while median tumor size is about 38 mm and 32 mm for the two groups respectively.We found that the survival time between males and females to be significantly different from each other, which was already a known fact [25], and hence, two separate analysis were conducted for each of them.In order to develop the piecewise constant hazard model, we partitioned the time into six disjoint intervals each with a 12-month period (1 year).Then we carried our analysis using GEE and ANN models using in SAS and MATLAB.

Results
For both males and females, we created a training data set of 70% and a testing data set of 30%.The training set was used to train the models while the testing dataset was used to evaluate the prediction accuracies of the proposed models.
We started our analysis with Poisson regression models.However, according to the deviance and the Pearson chi-square statistics, none of those models were adequate [26].This might be due to the high correlation among data.We even tried to use several other models, a Poisson model with an overdispersion parameter and a negative binomial model.There was no significance difference; results remain unchanged with models being inadequate.Therefore, we chose an alternative method, generalized estimating equations.Using this approach, we came up with two different statistical models for males and females.We found the interaction between the stage and histology to be significant in both males and females.Applying these two models, we were able to predict the hazard and to obtain the corresponding survival probabilities for our lung cancer testing data.
Following to that, we proceeded with building ANN models.We created both PEANN and our proposed ANN models.As mentioned earlier, we considered five different weight decay values: 0.01, 0.025, 0.05, 0.075, 0.1 and 10-fold cross validation was used to find the optimal number of hidden nodes in each case.
The optimal network is selected based on the minimum average validation error.
By using each optimal network, we predicted the hazard and corresponding survival probabilities for the testing data.In order to evaluate the prediction accuracies of ANN and GEE, we used the actual survival times and their predicted median survival times of non-censored subjects.For a better comparison, several prediction errors were considered, including the root mean square error (RMSE): average differences between actual and the predicted values, mean absolute error (MAE): average of the absolute errors, mean percentage error (MPE): average of percentage errors, and relative squared error (RSE): total squared error normalized by the total squared error of the simple predictor for both males and females as given in Table 4 and Table 5.We can get a better idea of using these different error measurements as they serve various aspects of the model predictions.As can be seen from Table 4 and Table 5, the proposed ANN method is better than both GEE and PEANN with respect to RMSE and MAE for both genders.
In addition to that, RSEs for new ANNs are smaller than those two types of models.
Although the predictions of new ANNs have negative biases which indicate underestimations of the survival, it is significantly less than the other two models.
In particular, we found the smallest error values for the new ANN models with weight decay 0.05 and 0.075 for females and males respectively.Further analysis of the hazard rates was carried out using those two models.Figure 4 represents the variation in the hazard rates according to the age group and histology types, for both males and females.From that, we can observe that there are noticeable differences in the hazard.Nevertheless, we did not find the interaction between age group and the histology to be significant from our GEE model.This depicts the capability of detecting nonlinear patterns by ANN models.
In accordance with [27] as, we observed a comparatively higher hazard for elderly patients than for younger patients, regardless of their gender.Usually, for elder patients, hazard seems to be elevated soon after the diagnosis while for younger patients, hazard tends to increase over time.
Figure 5 and Figure 6 represent the variation in the hazard function according to the histology and stage for males and females respectively.These graphs reveal the fact that there exist significant differences in the hazard according to the histology and stages among males and females.We can see that the hazard for small cell carcinoma is higher in females than in males for all stages as confirmed by the [28].

Discussion
We have introduced a new neural network architecture to model the piecewise constant hazard model.This provides a more convenient approach to handle a

For
example, RMSE and MAE are conventional measures of prediction accuracies while MPE acts as a good measure of bias in the predictions.RSE gives the relative error to what it would have been if a simple predictor (the average of the actual values) had been used.

Figure 2 and
Figure2and Figure3represent the hazard variation among male and female patients according to different tumor sizes while keeping the other categorical risk factors in their mode categories.It is important to note that we have chosen different tumor size ranges for males and females depending on the available training data.We followed this precaution to increase the validity of our results, as ANNs are data driven models, and hence depend heavily on the amount of training data with each tumor size.According to Figure2, we can see that the hazard rate for males increases over the follow-up years for smaller tumor sizes while it slightly decreases for higher tumor sizes.Males seem to have a high risk within the first two years of diagnosis.As opposed to males, there are significant differences in the hazard rate with the tumor sizes among females (Figure3).For smaller tumors, hazard rate increases over the years while for larger tumors, it increases during the first three years and then decreases.The above graphs reveal that our ANN models are capable of capturing the complex shapes of the hazard functions, which was one of the main advantages of ANN over conventional survival analysis models.Despite some unusual findings, overall these hazard functions capture the indwelling patterns in the data.

Figure 2 .
Figure 2. Hazard variation according to tumor size (mm) for males.

Figure 3 .
Figure 3. Hazard variation according to tumor size (mm) for females.

Figure 5 .
Figure 5. Hazard variation among males according to histology and stage.

Figure 6 .
Figure 6.Hazard variation among females according to histology and stage.

Table 2
. An ANN consists of several layers; input, hidden and output.Input and the output nodes are usually determined by the nature
In this example, we have four inputs, the covariates 1 X and 2 Xand two indicator variables 1 R and 2 R which we used to denote the risk type of the subject.For censored subjects like C, we need to consider the possibility of exposing into each of these risk types.Hence, those data are repeated twice as given in Table2.Assuming a constant hazard for each year, we have J number of output nodes in the ANN.For each subject i, we have a vector of outputs with the following values.

Table 3 .
Lung cancer patient information.

Table 4 .
Model evaluation for males.

Table 5 .
Model evaluation for females.