Assessing Efficient Risk Ratios: An Application to Surgical Stage Prediction in Cervical Cancer

Background: Cervical cancer remains the second most commonly diagnosed cancer and the third leading cause of cancer death in developing countries. Improving clinicians’ knowledge and understanding of surgical staging is critical in the fight against the disease. However, a systematic evaluation of different ordinal regression models based on diverse predicted outcomes has not been given its due share in literature. Objective: To systematically assess the flexibility of odds ratios for three popular ordinal regression models i.e. the Multinomial Logistic (ML) model, the Continuation Ratio (CR) model and Adjacent Category Logistic (ACL) model when applying cervical cancer data in surgical stage prediction. Method: We systematically, compared the performance of CR, ML and the ACL as the predictive mechanisms, and evaluate the most appropriate model in the cervical cancer setting. The study considered women who visited the Oncology department at the Moi Teaching and Referral Hospital’s Chandaria Cancer and Chronic Diseases Center and were diagnosed and surgically treated for cervical cancer from January 2014 to December 2018. Results and Conclusion: We presented the comparison between 3 different regression models for ordinal data within the cervical cancer setting. We found that the CR model without proportional odds yielded better results comparing Akaike Information Criterion (AIC), log likelihood ratio and residual deviance. In addition, the key prognostic factor associated with invasive cervical cancer was the (International Federation of Gynecology and Obstetrics) FIGO clinical stage which in particular, had a higher influence on the surgical Stage 2 outcomes compared to the lesser surgical stage categories. All the 5 independent features selected for classifying the patients into surgical stages were the FIGO clinical stage and partly, the presence or absence of symptomatic vaginal discharge.


Introduction
Cervical cancer remains the leading type of malignant growth in Kenya among women of all ages with a crude incidence rate of 22.4 per 100,000 persons and a crude mortality rate of 11.5 per 100,000 in the year 2017 [1]. Cervical cancer is caused by infection of the cervix by the human papilloma virus (HPV). The persistence of the HPV infection on the cervix causes oncogenic cell transformation at the squamous columnar junction [2]. HPV types 16 and 18 are the most prevalent among women with a normal cytology, low and high grade cervical lesions and those who progress to cervical cancer [1]. Nonetheless, cervical cancer is the best preventable malignancy of all relevant human cancers with an increase in the establishment of cervical cancer screening centers in middle and low income countries. The introduction of screen and treatment strategies for patients with abnormal Visual Inspection with Acetic Acid (VIA) of the cervix has increased the number of women screened and treated for cervical cancer in Kenya [3]. Howbeit, with the availability of HPV vaccines, the high cost limits their implementation in middle and low income countries leading to more access to surgical care than chemotherapy and radiology [4].
Surgical treatment is among the curative options given to women diagnosed with cervical cancer in the middle and low income countries. The extracted specimen undergoes pathological assessment to determine the full extent of the disease thus classifying the specimen into a surgical stage. Allanson et al. [5] found that systematic evaluation of surgical treatment outcomes such as adverse effects and complications vitally improve patient health outcomes.
Authors who have looked at statistical and mathematical models that are applied in cancer setting include [6] [7] [8] [9]. However, medical studies with ordinal data, [10] have generally been dichotomized prior to analysis. According to Javali et al. [10], estimating the risk of adverse effects, often measured on interval scales remains critical interest of epidemiologists and statisticians. Ordinal regression models have been underutilized despite being applicable in many fields.
In support of risk estimation, Freedman [11] reports that the National Cancer Institute had identified risk prediction as an area of extra-ordinary opportunity in the "Nation's Investment in Cancer Research". The relevance of risk prediction today in cervical cancer care is best summarized by Dr. Micheal Rothberg [12]: While HPV tests are very helpful in predicting cancer risk, other factors are just as powerful at predicting cervical cancer risk. The more that we can personalize risk prediction, the more efficient our screening efforts will become. Globally, the development and use of predictive models today is growing rapidly and highly applicable in the health care sector for the provision of efficient care and resources to patients. Predictive models are developed from statistically significant factors associated with the outcome of interest and the models can range from complex to simple. The application of predictive modeling techniques in the early diagnosis and prognosis of cancer has become a requisite to facilitate effective clinical management of patients. More so, machine learning techniques aim to model the progression and treatment outcomes of the cancer and improve our understanding of the disease thus resulting in accurate and effective management of cancer patients. The techniques could improve the accuracy of cancer susceptibility, recurrence and survival prediction.
Further, predictive models can be used to risk-stratify patients and appropriately distribute resources such as caregivers and treatment combinations to the women and also, identify women who are at high risk of progression to clinical disease for disease management programs. Notably, predictive modeling in the health sector has the potential to impact clinical and therapeutic decision making.
This article gives an overview of 3 regression models developed for ranked data.
It is clear that the most popular model for the analysis of ordinal data is the CPO model. However, the inflexibility of the proportional odds assumption brought about the development of other regression models for ordinal data that would ease on the proportional odds assumption. Generally, regression analysis investigates the influence of multiple predictors or independent variables on a dependent variable or outcome. The assumption of proportional odds in ordinal regression is that the effects of any explanatory variables are consistent or proportional across the different categories. One of the major shortcomings of the CPO model is the relationship between the predictors and the response variables that can be greatly misleading when assumptions are violated. Theoretically a more recommended model for ordinal data would take into account the categorical nature of the response since more information is contained within the ordered structure of the categories [13]. Ordinal data is non-separable, independent, strictly increasing (decreasing) with arbitrary cut-points of some underlying continuum [13].
Based on the pathologist's point of view i.e. the surgical stage in this study, the most vital prognostic factors were presented and existing dissensions in the classification and diagnosis of the extracted tumors clarified by 3 types of regression models. In this study, we seek to assess 3 types of regression models for ordinal responses to predict the surgical stage of HIV infected and uninfected women surgically treated upon being diagnosed with cervical cancer. The 3 predictive mechanisms covered here have previously been looked by [14] [17]. Such models explicitly recognize ordinality, avoid arbitrary assumptions concerning the ordinal scales and allow for analysis of continuous, dichotomous and ordinal variables within a common statistical framework [17]. Statistical packages such as lme4, nnet were developed to allow for the implementation of cumulative link (mixed) models which are also known as ordered regression models, proportional odds models, proportional hazards models for group survival times and ordered logit/probit model. Estimation techniques were mainly via maximum likelihood [15]. Through extensions to non-linear models, McCullagh reports that the method of iteratively reweighted least squares converged to the maximum likelihood estimate which greatly simplifies the necessary computation of regression models for ordinal data [14]. Excellent summary can be found in [18]. Statistical methods for modeling ordinal response data such as the continuation ratio model, the polytomous logistic model among others are fully described with application to perinatal health programme data [18].
The rest of the paper is organized as follows. The methods and materials are covered in Section 2. In Section 3, we give an elaborate description of analysis and results. In Section 4, we discuss and describe the results. We compare the three models and show application of these methods to the cervical cancer data.

Multinomial Logistic Regression (ML) Model
Let Ψ be a multinomial response variable with categorical outcome 1,2, ,n  and let i ψ denote a p-dimensional vector of exploratory variables. The dependence of Ψ on i ψ can be expressed as [18]: The logit form of Equation (1) yields: The parameter j α is the unknown intercept and ( ) 1 2 , , , n β β β β =  is a vector of unknown coefficients responding to ψ . Extensive coverage of the properties of β and α can be found in [14] [18].
The odds ratio, t Θ of the k th covariate k ψ is expressed as:

Continuation Ratio (CR) Model
Here we replace The odds ratio of CR model is then obtained as:

Adjacent-Category Logistic (ACL) Model
The ACL model involves the ratio of two probabilities i.e.
The parameter 1 β corresponds to the coefficients of the log-odds of ( ) Consequent odds follow the same pattern.

Demographics
The

Materials
Upon visiting the cervical cancer clinics for screening within the Western and Rift region of Kenya, women with suspicious lesions on the cervix would undergo colposcopic biopsy whereby a colposcope was used to examine the cervix for any abnormal tissue. A biopsy punch forceps was utilized to remove a small fragment of the abnormal area or suspicious lesion which was taken for pathological evaluation to determine the type of invasive cancer (squamous cell carcinoma or adenocarcinoma). In addition, a physical examination of the cervix was done to determine the clinical stage of the cancer, blood tests, CT scans and chest x-rays.
The pathology result was received after two weeks and the women underwent gynecological review. The women were asked standardized questions concerning their social behaviors, demographic details and past treatments assigned which determined the new treatment given at that particular time. Women assigned to have surgical treatment were scheduled and surgery carried out. The specimens were taken to the pathologist for surgical pathological evaluation to clearly assess the extent of the disease and determine the direction of treatment. The pathologists carried out physical and microscopic examination of the extracted tissues.
The specimens were classified under surgical stages that state the involvement of the lymph nodes, the parametrium and also, determine whether surgery was the only treatment necessary or whether alternative treatment would be needed.

Procedure
The research design for this study was cross-sectional. The data for the study was retrospectively retrieved from the gynecological cervical cancer database. The data had been collected previously and was parallel to the patients' record files. The women who attend the gynecology clinic usually return for follow ups weekly, monthly and after 3 months. The gynecologists use files to record patient information at every visit and research assistants key in the recorded data into an MS access database at the close of the clinic sessions.
690 women with complete records sought treatment at the oncology clinic with only 75 women found to be eligible and their data utilized in the building of the predictive models. Moreover, data was simulated to test the performance of the developed models as the original data of 75 women was small to allow for parti-

Statistical Tests
In this study, regression models were used to explore the relationship between the response variable (surgical stage) and the explanatory variables. The data was analyzed using R studio version 3.6.1. Chi-square tests and analysis of variance tests were carried out for categorical and numerical variables. The ANOVA test allowed us to examine the variation in the frequencies within each surgical stage (the response variable). Three regression models for ordinal data were developed and their predictive performance evaluated by comparing the odds ratios. These models were adapted because the response variable was an ordered variable. The 3 models were the multinomial (polytomous) logistic model, the continuation-ratio model and the adjacent-category logistic model for which the later 2 were developed with and without the proportional odds assumption.
We utilized R command multinom (Package: nnet) to fit 2 multinomial log-linear models via neural networks. For the ACL model, we utilized the R vgam package

Descriptive Analysis
The data from patients with confirmed invasive cervical cancer was analyzed.
The entire dataset had 690 women with confirmed invasive cervical cancer. Ta

Regression Analysis
Comparisons were made based on parameter estimates, log likelihood, residual deviance and AIC for the 3 regression models for ordinal data. Only 5 predictor variables were significantly associated with the response variable: Surgical stage.

Multinomial Logistic Regression Model
During the analysis, the baseline category was surgical Stage 0.   Table 3 shows a multivariate ML model which was fitted with 5 statistically significant predictors.

The Continuation Ratio Model
When the focus is on a particular category given that a patient must pass through a lower surgical stage category before achieving a higher category, the continuation ratio model is considered a more appropriate choice. The proportional odds assumption was tested by fitting this particular model with and without the proportional odds assumption. , z-value = 2.293 had a positive effect on the surgical stage responses. In addition, a p-value of 0.02182 showed that the FIGO clinical stage is a statistically significant predictor. The remaining 4 predictors that were not statistically significant to the surgical stage responses were vaginal involvement, parametrial involvement, symptomatic vaginal discharge and lower abdominal pain.   Table 8 and Table 9 show the goodness of fit statistics for the 2 CR multivariate models.
Equation (7) and (8) Table 10 shows the odds ratios extracted from the CR multivariate model without proportional odds.
A brief summary of the odds ratios for the model ( )  is given below: The odds of having an outcome greater than surgical Stage 1 relative to being  is given in Table 11. The odds of having an outcome greater than surgical Stage 2 relative to being in surgical Stage 2 was 46.20 times higher among the patients diagnosed with FIGO clinical Stage 2 compared to the patients diagnosed with FIGO clinical Stage 1, after controlling for the effects of other predictors in the model. The odds of having an outcome greater than surgical Stage 2 relative to being in surgical Stage 2 was 4.88 times higher among the patients considered to have the vaginal region affected by the cancer (vaginal involvement) compared to the patients without any vaginal involvement after controlling for the effects of other predictors in the model. The odds of having an outcome greater than surgical Stage 2 relative to being in surgical Stage 2 was 25.02 times higher among the patients considered to have the parametrium region affected by the cervical cancer (parametrial involvement) compared to the patients without any parametrial involvement after controlling for the effects of other predictors in the model. The odds of having an outcome greater than surgical Stage 2 relative to being in surgical Stage 2 was 0.24 times lower among the patients with symptomatic vaginal discharge (Symptoms: Discharge) compared to the patients who did not have symptomatic vaginal discharge after controlling for the effects of other predictors in the model. The odds of having an outcome greater than surgical Stage 2 relative to being in surgical Stage 2 was 0.16 times lower among the patients displaying symptomatic lower abdominal pain (Symptoms: Pain) compared to the patients without symptomatic lower abdominal pain after controlling for the effects of other predictors in the model.

Adjacent Category Logistic Model
The Adjacent Category Logit model is a special form of generalized logit models that involves the simultaneous estimation of the effects of predictor variables in pairs of adjacent categories The ACL model involves the ratio of two probabilites The proportional odds assumption was tested by fitting the ACL model with and without the proportional odds assumption. Table 12 and Table 13 show the summary of the ACL univariate and multivariate model with and without proportional odds respectively. For the ACL model with proportional odds, we found that the FIGO clinical stage had a statistically significant effect on the surgical stage response with a p-value of 0.00207. The estimated logit regression coefficient for the FIGO clini-     Equation (9) and (10) The 3 ACL models with and without proportional odds were compared to determine the model best fit for the cervical cancer data. The multivariate ACL model with proportional odds had a misclassification rate of 32.00% and 37.32% whereas the multivariate ACL model without proportional odds had a misclassification rate of 29.33% and 37.03% when the train and validation datasets were utilized respectively. Clearly, there was an increase in misclassification by 5.32% and 7.70% respectively.

Discussion
The aim of cervical cancer screening is to detect the pre-cancerous changes on the cervix which may lead to cancer. The objective of this study was to evaluate the predictive performance of 3 regression models for ordinal responses on the surgical stage of women treated surgically for invasive cervical cancer. The results provide an understanding of the future possibilities of using predictive algorithms in the Kenyan oncology setting. The relationships between the surgical stage and 5 statistically significant variables were investigated by applying regression models and comparing the odds ratios. The findings showed that the FIGO clinical stage, parametrial involvement, vaginal involvement, symptomatic vaginal discharge and lower abdominal pains are independently associated with the surgical stage.

Application to Surgically-Treated Cervical Cancer Patients
Results show that among the 3 ordinal regression models, the CR model without proportional odds was found to best classify the surgical stages of the patients with a misclassification rate of 30.67% and 39.09% for the train(original) and test (simulation) set. Although the 3 models are similar in that they fit multiple simultaneous binary logits, there were some restructuring of categories. The CR model fits 2 logits on each consecutive step; in terms of dummy variables, with the increasing "0" category, the "1" category is considered the higher category. We compared the odds ratios of the 3 models. The odds ratio is not an absolute number [19]. In addition, odds ratios are simple to compute and can be applied to discrete and continuous explanatory variables [19]. The odds ratios compare the relative odds of the response (in our case, surgical stage), given exposure to explanatory variables of interest. She further expounds that the odds ratios can ascertain whether a particular exposure is a risk factor and compare the magnitude of various risk factors for the specific response. The 95% odds ratios confidence intervals estimate the precision of the odds ratios and are considered a substitute for the presence of statistical significance when the null value (OR = 1) is not overlapped. Low levels of precision are indicated by large confidence intervals whereas high levels of precision are indicated by small confidence intervals. Specifically, the FIGO clinical stage had a higher influence on women whose odds of having a surgical stage greater than surgical Stage 2 relative to being in surgical Stage 2. Though the results gave large confidence intervals indicative of low precision, a statistically significant p-value (0.0349) and confidence intervals that did not span the null value (OR = 1) confirmed the result. The OR for the other 4 predictors showed decreased odds of having a surgical stage greater than surgical Stage 2 relative to being in surgical Stage 2. Also, there was decreased odds of having a surgical stage greater than surgical Stage 1 relative to being in surgical Stage 1 with the confidence intervals for the 5 statistically significant predictors spanning the null value (OR = 1). Clearly, there was no statistical significance with the regression coefficients having p-values at >0.05. The likelihood chi-square ratio test showed that the CR model without proportional odds (chi-square p-value = 0.0823) is adequate compared to the CR model with proportional odds.
In our study, the CR model without the proportional odds assumption was the best fit compared to the CR model with proportional odds. Based on the comparison of models, the continuation ratio model, the adjacent category model, the multinomial model and two other models on the ordinal response of hospital length of stay with patient characteristics as covariates were compared. The ordinal regression model, the CR model and the ACL model violated the proportional odds assumption. Moreover, the estimated relative risks of the multinomial model, the cumulative ratio model and the continuation ratio model on blood cancer ordinal responses were compared [20]. The authors determined through the goodness-of-fit statistics, the regression diagnostic analysis, small standard errors and smaller 95% confidence intervals that the CR model was the best fit model for the ordinal responses. The CR model as compared to the ACL model and the baseline category model, the CR model is recognized for being a simple decomposition of a multinomial distribution, its possession of the property of conditional independence between categories and the model's significance levels capability of being affected by a reversal in the order of the categories. A prior study compared the fit of the baseline category model, the proportional odds model and the adjacent category model in determining the prostate cancer stage and found the baseline category model to have the highest DIC [21]. The authors took the investigation further by comparing the baseline category model to a logistic regression model fitted to dichotomized ordinal responses which demonstrated that the baseline category model was a superior fit. At least 50 multinomial events per variable was recommended leading to the MLR predictive performance gradually improving as the number of multinomial events per variable increases [22]. Our study results show that this could be the possible reason for the MLR model estimated by maximum likelihood being the most unlikely choice among the 3 predictive mechanisms.

Conclusion and Recommendations
This article presented the comparison between 3 different regression models for ordinal data with respect to the best fit model for our cervical cancer dataset. We found that the CR model without proportional odds yielded better results due to the highest AIC and log likelihood ratio and the lowest residual deviance. In addition, it is clear that with our cervical cancer data, the key prognostic factor associated with invasive cervical cancer was the FIGO clinical stage which particularly, had a higher influence on the surgical Stage 2 outcomes compared to the lesser surgical stage categories. All the 5 independent features selected for classifying the patients into surgical stages that made sense were the FIGO clinical stage and partly, the presence or absence of cancer of symptomatic vaginal discharge. The study was limited by the fact that the cervical cancer data was not created for the purpose of building statistical models thus was not sufficient and probably lacked key predictors for the type of analysis carried out in our study. Thus, our study demonstrates the need of databases with additional variables that could be significant to determining the suitability of surgical treatment such as molecular data, CT/MRI imaging information and HPV-DNA types. Moreover, research and data collection for predictive algorithms could introduce practical learning tools for the medical students who undergo medical training at the Moi Teaching and Referral hospital. The data was biased due to the dropping of incomplete records which left a small sample for building the models. Also, data was simulated to test the predictive capabilities of the models and statistical techniques were not utilized to address the imbalanced nature of the data as well as missing data. Although 4 predictors were not found to be key prognostic factors for highly accurate classifications in our models, future research utilizing data structured for developing predictive models in the cervical cancer setting could yield better results that could be integrated into the oncology system. A strict and validated ordinal classifier can more accurately predict the cancer stages (ordinal scales) compared to non-ordinal classifiers as noted by the polytomous logistic regression model [23].