Cumulative Link Ordinal Outcome Neural Networks: An Evaluation of Current Methodology ()
1. Introduction
A substantial amount of data is currently being collected from diverse facets of our daily existence [1]. This encompasses information about dietary consumption, transactions undertaken in retail locations, imagery captured at traffic intersections, locational data derived from mobile devices, and even biometric information. In addition, we have entered an era in which data gathering is particularly abundant in domains related to health [2].
Several technological advancements have led to the development of a comprehensive framework for gathering health-related information. These include incorporating wearable devices, studies involving genomic data, and the evolution of Electronic Health Records (EHRs), among other data sources [2] [3]. This framework includes a wide range of patient information, including sociodemographic characteristics and treatment outcomes [4] [5].
This data-rich environment holds immense promise for the future of health care. This paves the way for personalized medicine, enabling the development of advanced predictive models to forecast individual patient outcomes and tailor treatment plans accordingly [6]-[8]. These models are crucial for anticipating adverse events, optimizing resource allocation, informing clinical trial designs, and guiding personalized patient counseling and research efforts [9].
However, the current data collection landscape is not without its challenges. The dynamic nature of these data presents a hurdle for many traditional statistical and machine learning methods, making it challenging to analyze and extract value from them effectively. The gathering of genetic, demographic, and clinical data leads to an analytical dataset with a large number of variables compared to the number of observations, further complicating the analysis process [10].
Much of the collected information, such as treatment outcomes, is ordinal in health-related data. Ordinal variables are categorical variables with a natural order or ranking; however, the differences between categories are not measurable [11] [12]. Examples of ordinal outcomes include:
1) Treatment response (minimal, mild, moderate, moderate-severe, and severe)
2) Tumor stage (I, II, III, IV)
3) Severity of side effects (None, Mild, Moderate, Severe, Life-threatening)
4) Performance Status, ECOG (0: fully active to 5: dead)
Employing ordinal outcome models is advantageous and essential for comprehensively harnessing the information related to ordinal outcomes. In prior analyses, numerous researchers transformed ordinal outcomes into a continuous or binary variable [13]. However, these methods are often not ideal because they do not consider the natural structure of the outcomes. Ordinal data must be modeled as ordinal data rather than converted into binary or continuous formats [14]. This preserves the natural representation of the variable. A more appropriate analysis can be performed by maintaining the ordinal nature of the data [15] [16].
Ordinal neural networks are viable candidates for analyzing high-dimensional health data with ordinal outcomes. These models facilitate the prediction of outcomes using a specified covariate set. Furthermore, neural networks provide a robust framework for addressing ordinal classification problems in this domain, owing to their capacity to learn complex patterns from high-dimensional data [17]. These models integrate the strengths of neural networks and traditional ordinal regression techniques. Vargas et al. [18] proposed a deep neural network for ordinal regression based on the proportional odds model, which projects patterns into a one-dimensional space using non-linear deep learning. They combined this approach with a loss function that considers class distances, demonstrating improved performance on ordinal classification problems. Kook et al. [19] [20] introduced ordinal neural network transformation models (ONTRAMs), which can handle tabular and complex data such as images while maintaining interpretability. Moayed & Shell [21] compared neural networks with logistic regression to predict occupational health and safety outcomes using ordinal variables. Their study, utilizing data from construction workers, demonstrates that neural networks significantly outperform logistic regression when working with datasets comprised entirely of ordinal variables. However, the selection of the activation function, which maps the network’s output to ordered categories, significantly impacts the model’s performance and interpretability [22].
As such, this manuscript presents a comparative analysis of various ordinal activation functions based on four prominent cumulative link functions employed in ordinal regression: logit, which is based on the logistic distribution; probit, which is based on the normal distribution; complementary log log (cloglog) which is based on the Gompertz distribution; and log log (loglog) function which is based on the Gumbel distribution [23]. The logit model assumes proportional odds ratios across categories and provides a widely adopted and interpretable foundation for biomedical research [23]. The normal distribution (probit) is undoubtedly the most popular and easily understood distribution [24]. The Gompertz distribution (cloglog), which is often utilized in survival analysis and scenarios involving rapidly changing odds (e.g., rapid disease progression or treatment response), offers a viable alternative [25]. Furthermore, the Gumbel distribution (loglog), which is commonly used in extreme value theory, introduces another perspective by modeling the distribution of maximum or minimum values, which can be relevant for capturing extreme events in biomedical phenomena [26].
This study formally describes neural networks that employ the four activation functions. The four resulting models were tested on both simulated and real-world datasets [27]. The objective of this study was to systematically assess the performance of various activation functions in relation to the four link functions, focusing on how these functions affect the predictive accuracy of ordinal neural networks. Additionally, this study examines whether certain activation functions consistently outperform others across different datasets or whether their effectiveness depends on the unique features of the data.
The insights gleaned from this investigation will deepen our understanding of ordinal neural networks and provide practical guidance for biomedical researchers in selecting optimal activation functions according to their specific needs. This work further illuminates the intricate relationship between activation functions, distributional assumptions, and the nature of health data, ultimately contributing to the better application of ordinal neural networks to draw valuable information from health data.
2. Materials and Methods
For a given observation i, denote
as a vector of p covariates, with
. This can be interpreted in biomedical research as the vector of independent variables collected on a research participant throughout the study’s course. In addition, we assumed that an ordinal outcome would be recorded for this study. As such, we denote the outcome vector
where
if, for observation i, the outcome is in the jth category, all other entries are set to 0. There are J possible levels for the outcome. An example of a relevant outcome could be the response to treatment at the following levels:
1) Complete Response,
2) Partial Response,
3) Stable Disease,
4) Progressive Disease.
The ordering of categories is evident. For any given study, the goal is to use the covariate vector
to predict the outcome vector of interest,
, for all study participants. The aggregation of vectors for all subjects yields the covariate matrix
, a p by n matrix where the ith column is set to
. We also have a J by n matrix denoted
, where the ith column is set to
. The goal is to develop a function
, that aims to predict the
matrix using the
matrix. To accomplish this task, an ordinal outcome neural network framework was employed as the machine learning model. Initially, cost functions based on four activation functions are presented. Subsequently, the overall architecture of the neural networks is delineated. The application of the methodology to the two datasets is described, along with the metrics collected. A description of the statistics used to compare the differences in the classification rates for the four methods is presented.
2.1. Cumulative Link Based Outcome Functions
The cost function for the neural network is derived from the natural logarithm of the likelihood of a multinomial distribution (
). This function is represented as
,(1)
where
. Define the cumulative probabilities
as
. In addition,
and
. Replacing
with the cumulative probabilities
in
allows us to rewrite Equation (1) as
.(2)
We are primarily concerned with directly modeling cumulative probabilities, and to accomplish this, we employed cumulative link models (CLMs).
CLMs are statistical models designed to analyze data with ordinal outcomes. As such, we are concerned with modeling the cumulative probabilities
. In CLMs,
is linked to a function of predictor variables through a link function. This approach allows for the incorporation of the covariate vector
to explain the variation in the ordinal outcome while maintaining the inherent order of the response categories. The four-link functions considered in this study are as follows:
1) Logit (logistic Distribution)
2) Probit (normal distribution)
3) Complementary log-log (Gompertz distribution)
4) Log Log (Gumbel Distribution)
By employing different link functions within a neural network framework, we can leverage the flexibility and power of machine-learning techniques while maintaining the structured approach of CLMs.
By employing the logit link, the cumulative probabilities are modeled as:
(3)
where
can be thought of as the intercept parameter for the jth level and
is a non-linear function of
. Considering the probit link to model cumulative probabilities leads to:
.(4)
When the cloglog link is considered, we have
(5)
Finally, utilizing the loglog link leads to:
(6)
These cumulative probabilities can be substituted into Equation (2) and solved accordingly.
2.2. Neural Network Architecture
A neural network was used in this investigation. The neural network comprised one hidden layer with six nodes. A neural network with a single hidden layer balances simplicity and capability, leveraging universal approximation to approximate functions while minimizing complexity and reducing overfitting. The architecture is subsequently delineated. For the first layer, we applied the function
(7)
where
is a 6 by p matrix and
is a column vector of length 6. As, such
is of dimension 6 by n. After this, the first activation function was applied and is defined as
(8)
where
is called the Leaky Rectified Linear Unit (RELU) [28] and is defined as
. For the second layer, we applied the function
(9)
where
is a J-1 by 6 matrix and
is a column vector of length J-1 with the jth element defined as
from the previous section. As such
, from the previous section, is defined as the value from the ith column of
. After this, the four link functions from Equations (3) through (6) were applied, elementwise, to
. This is presented as
,(10)
where the function
is the element-wise application of Equations (3) through (6) to the matrix
. After this, the cost function from Equation (2) was constructed, where
.
To obtain the optimal values for
,
,
, and
, the Adam optimization algorithm was applied [11] [29]. Once these optimal parameters were obtained, they were used to evaluate the neural network’s predictive capabilities on validation data. Hyperparameter tuning was not performed.
2.3. Application to Simulated Data
The simulated data procedure has been previously introduced [11]. The data were simulated according to four covariate scenarios. The covariate scenarios are as follows:
1) Autoregressive (1)
2) Compound Symmetric
3) Toeplitz
4) Unstructured
For each scenario, a total of 100 datasets were simulated. Each dataset was divided into subsets comprising 80% for training and 20% for validation. Each dataset was simulated to yield an ordinal outcome comprising four categories (levels). Due to the nature of the simulation, the sample size for each category varies, resulting in an imbalance among the classes in most cases. The four neural networks, predicated on the four link functions delineated in Equations (3) through (6), were implemented on each training dataset. After optimizing these neural networks with respect to their parameters of interest, the parameters were subsequently applied to the validation data, and the percentage of correctly classified observations was documented. The medians and interquartile ranges are reported for each neural network employed within each covariate scenario. Assuming and verifying non-normality via a Kolmogorov-Smirnov test, the Kruskal-Wallis test was used to determine the equality of medians among the four neural networks applied within each scenario level. The Wilcoxon rank-sum test was implemented to facilitate pairwise comparisons of the neural networks within each data scenario. Furthermore, the percentage of observations correctly classified per forward propagation iteration of the Adam optimizer was reported for a subset of the training data. All analyses were conducted using R statistical software [30].
2.4. Application to Genomic Data
Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer, accounting for approximately 90% of all liver cancers. It is the fifth most common cancer in men and the seventh most common cancer in women globally, with over half a million new cases diagnosed annually [31] [32]. The incidence and mortality rates of hepatocellular carcinoma vary significantly across different geographical regions, with the highest rates observed in sub-Saharan Africa and Southeast Asia, where hepatitis B virus infection is endemic [31] [33]. In contrast, while increasing, the incidence and mortality rates in Europe and the United States remain relatively low compared with other regions [31].
Several factors contribute to the development of hepatocellular carcinoma, including viral hepatitis (hepatitis B and C), cirrhosis, nonalcoholic fatty liver disease, and various genetic and environmental factors [34] [35]. Understanding the epidemiology, risk factors, and treatment strategies of hepatocellular carcinoma is crucial for improving patient outcomes and reducing the burden of this disease worldwide.
The real-world dataset employed in this study to predict an HCC-related ordinal outcome is a subset of the genomic dataset found in the Gene Expression Omnibus data repository labeled GSE18081 [27]. The data has a sample size of 56. The ordinal outcome of interest is the classification of tissue samples as normal (n = 20), pre-malignant (cirrhosis; n = 16), or malignant (hepatocellular carcinoma; n = 20) using a small set of 46 molecular features. The study utilized data from the BeadArray Cancer Panel I. All technical replicate samples and matched cirrhotic samples from subjects with HCC were removed from the dataset. As such, all the samples in the reduced dataset were independent. The dataset is called the hccframe and is located in the ordinalgmifs R package [27]. Data were obtained using the following R code:
>install.packages("ordinalgmifs")
>library(ordinalgmifs)
>data(hccframe).
The input features were standardized to be between zero and one. To evaluate the neural networks against the dataset, a 10-fold cross-validation approach was employed. The dataset was randomly partitioned into ten segments. Nine segments (approximately 90%) were utilized for training the neural network, and the remaining 10% served as validation data. Subsequently, the percentage of correctly classified instances across the entire dataset was computed. This procedure was executed 100 times for each neural network. The median and interquartile range of 100 percent of the correctly classified observations were subsequently calculated for each neural network. Upon assuming and confirming non-normality via the Kolmogorov-Smirnov test, the Kruskal-Wallis test was employed to assess the equality of medians among the four neural networks. The Wilcoxon rank-sum test was also conducted to facilitate pairwise comparisons among the neural networks.
3. Results
3.1. Simulated Data
The proposed methodology, comprising four cumulative link models, was applied to the simulated data. The objective was to compare the performance of the four distinct cumulative link models. The models were evaluated based on four criteria: optimization, median percent correctly classified, IQR, and nonparametric tests to determine statistically significant differences across groups.
Figure 1 visually presents the accuracy (percentage of observations correctly classified) achieved by different models when classifying observations across iterations of the applied Adam optimization algorithm, specifically for four distinct covariate scenarios: Autoregressive (1), Compound Symmetric, Toeplitz, and Unstructured. The results are presented for the training datasets. The performance oscillated for all models across all covariate scenarios and then consistently increased for the given datasets. Once the method maximizes the proportion it can correctly estimate, it oscillates slightly around that value. The results indicated that the four models were suitable for optimizing the test datasets. The logit link model outperformed the other models regarding prediction accuracy across all covariate scenarios. However, it requires the most iterations to reach its maximum accuracy level, highlighting a trade-off between accuracy and computational complexity.
![]()
Figure 1. Classification accuracy across iterations for four covariate scenarios: Autoregressive (1), Compound Symmetric, Toeplitz, and Unstructured-using cumulative link functions (Cloglog, Logit, Loglog, Probit). The x-axis represents the iteration number, while the y-axis showsthe percentage correctly classified.
Focusing on the Autoregressive (1) covariate scenario, it is evident that the logit model, represented by the green line, exhibits the highest classification accuracy, attaining a maximum of 90% of observations correctly classified. The other models exhibited a gradual upward trend in their performance metrics. Notably, the probit and loglog models, illustrated in gray and orange, respectively, closely followed the logit model in terms of accuracy at a maximum of 86%. The cloglog (blue line) model had a maximum accuracy of 84%. It is also worth mentioning that the probit model required the most significant number of iterations to achieve optimization at 8861.
Similarly, the logit model exhibits superior performance for the Compound Symmetric covariate scenario, maintaining a high percentage of correct classifications, with a maximum of 89% correctly classified. The other models demonstrated comparable performance, with a maximum correctly classified rate of 86% for the probit, 84% for the cloglog, and 83% for the loglog. The cloglog model requires the most iterations to be optimized at 8357 iterations.
The logit model outperforms the others for the Toeplitz covariate scenario, with a maximum correctly classified percentage of 92%. Loglog, cloglog, and probit maximize accuracy at 85%, 83%, and 82%, respectively. The Logit requires the most iterations (15,158) to reach an optimal value of the percentage of observations correctly classified.
The logit model demonstrated the highest accuracy (90%) for the unstructured covariate scenario, while the cloglog, probit, and loglog derived models exhibited slower but steady improvements with maximum accuracies of 82%, 83%, and 81%, respectively. The logit model required the most iterations to be optimized at 8725.
The logit model consistently achieved the highest percentage of correctly classified data across all the scenarios. In addition, on average, the logit model requires the most iterations (9750) to attain its maximal value. In contrast, the probit model required the lowest number of iterations (6725) to achieve its maximum value.
Table 1 presents the performance of each method as applied to the validation datasets. The medians and interquartile ranges for all methods are reported. The validation data constituted 20% of the original dataset and were not used to develop the final models. The Kruskal-Wallis rank sum test, with a significance level of 0.05, was used to compare the methods’ median accuracy. The Wilcoxon rank-sum test was used for all pairwise comparisons within each covariate scenario.
Figure 2 graphically shows the same information as Table 1. It displays the percentage of correctly classified data across iterations for four covariate scenarios.
For the Autoregressive (1) covariate scenario, the logit link model (green boxplot) demonstrated the highest accuracy, rapidly attaining a median accuracy of 87%. Other link models also achieved over 80% accuracy (81% - 82%) but exhibited significantly lower accuracy than the logit model. Furthermore, the logit model displayed fewer outlying values than the other three cumulative link models. For
Table 1. Comparison of the four neural networks in ordinal outcome classification.
|
Cumulative Link |
p-value1 |
Cloglog |
Logit |
Loglog |
Probit |
Autoregressive (1) |
Correctly Classified |
|
|
|
|
<0.001 |
Median (Q1, Q3) |
0.81 (0.77, 0.83) |
0.87 (0.85, 0.88) |
0.81 (0.78, 0.84) |
0.82 (0.80, 0.84) |
|
Compound Symmetric |
Correctly Classified |
|
|
|
|
<0.001 |
Median (Q1, Q3) |
0.82 (0.80, 0.83) |
0.86 (0.84, 0.87) |
0.82 (0.79, 0.84) |
0.82 (0.79, 0.84) |
|
Toeplitz |
Correctly Classified |
|
|
|
|
<0.001 |
Median (Q1, Q3) |
0.81 (0.78, 0.83) |
0.86 (0.83, 0.87) |
0.81 (0.79, 0.83) |
0.81 (0.79, 0.84) |
|
Unstructured |
Correctly Classified |
|
|
|
|
<0.001 |
Median (Q1, Q3) |
0.82 (0.79, 0.84) |
0.86 (0.85, 0.88) |
0.82 (0.80, 0.84) |
0.83 (0.80, 0.84) |
|
1Kruskal-Wallis rank sum test. For each covariate scenario, 100 data sets were created. Each dataset was divided into 80% for training and 20% for validation. After training the four Neural Networks, the models were applied to the validation datasets. The median proportion and interquartile range of correctly classified observations for the 100 datasets are reported. The results of the Kruskal-Wallis rank-sum test are reported for the three models.
Figure 2. Comparative Analysis of Activation Functions in Ordinal Outcome Classification Box plots illustrate the percent correctly classified across four activation functions (Cloglog, Logit, Loglog, Probit) under different correlation structures: Autoregressive (1). Compound Symmetric, Toeplitz, and Unstructured. Each box plot represents results from 100 datasets, split into 80% training and 20% validation.
the Compound Symmetric covariate scenario, among the models evaluated, the logit approach exhibited the highest accuracy regarding correct classifications. The cloglog, probit, and loglog models yielded comparable results, with a median of 82% correctly classified. For the Toeplitz covariate scenario, the logit model performs superiorly to other models, with the cloglog, probit, and loglog cumulative link models exhibiting moderate improvement. For the Unstructured covariate scenario, the logit model leads in accuracy, seconded by the probit model with a median accuracy of 83%. Cloglog and loglog have a median correctly classified accuracy of 82%. The logit model consistently achieved the highest percentage of correctly classified data across all scenarios, with the fewest outliers.
Table 1 compares the performances of four cumulative link models—cloglog, logit, loglog, and probit—to correctly classify the ordinal outcome from the validation data across different covariate scenarios. The logit model consistently demonstrated the highest median accuracy across all covariate scenarios and exhibited the smallest interquartile ranges. The median of correctly classified observations for the logit model was 86% - 87%, whereas the median correct for the other cumulative link models was 81% - 83%. For all covariate scenarios, the p-value for all Kruskal-Wallis rank sum tests was less than 0.001, indicating statistically significant differences in performance among the models. Pairwise comparisons were performed to elucidate these differences. Significant pairwise differences for the Autoregressive 1 covariate scenario include: logit-cloglog at <0.001, logit-loglog at <0.001, logit-probit at <0.001, and probit-cloglog at 0.01. Significant pairwise differences for the Compound Symmetric covariate scenario included logit-cloglog at <0.001, logit-loglog at <0.001, and logit-probit at <0.001. Significant pairwise differences for the Unstructured covariate scenario included logit-cloglog at <0.001, logit-loglog at <0.001, and logit-probit at <0.001. Significant pairwise differences for the Toeplitz Covariate Scenario included logit-cloglog at <0.001, logit-loglog at <0.001, and logit-probit at <0.001. The logit model’s performance is significantly different from that of the other three models across all pairwise comparisons.
3.2. Genomic Data
The proposed methodology and the four cumulative link models were applied to the hccframe dataset. The goal was to compare the performance of the four different cumulative link models. Table 2 presents the performance of each method as measured by the median and interquartile range for all methods; 10-fold cross-validation was applied 100 times, yielding 100 estimates of percent correctly classified for the dataset. Kruskal-Wallis Rank sum test, with a significance level of 0.05, was used to compare median accuracy for the methods. In addition, the Wilcoxon rank-sum test was used to compare all the pairwise comparisons. Figure 3 visually presents the same information as Table 2. Table 2 compares the performance of four cumulative link models, cloglog, logit, loglog, and probit, in correctly classifying ordinal outcomes. The logit model showed the highest median accuracy across all scenarios and the smallest interquartile range. The median number of correctly classified observations for the logit model was 86%. The loglog model comes in second with a median percentage correctly classified at
Table 2. Ordinal neural network accuracy.
|
Model Type |
p-value1 |
Cloglog |
Logit |
Loglog |
Probit |
Correctly Classified |
|
|
|
|
<0.001 |
Median (Q1, Q3) |
0.66 (0.63, 0.70) |
0.86 (0.82, 0.88) |
0.71 (0.66, 0.73) |
0.66 (0.64, 0.71) |
|
1Kruskal-Wallis rank sum test. The percent correctly classified for the four models (Cloglog, Logit, Loglog, and Probit) applied to the hccframe dataset. The results are based on 10-fold cross-validation, repeated 100 times.
Figure 3. Comparative accuracy of neural network models on the hccframe dataset. Boxplots illustrate the distribution of classification accuracy for Cloglog, Logit, Loglog, and Probit models, evaluated using 10-fold cross-validation over 100 iterations.
71%, with the cloglog and probit having values of 66% and 66%, respectively. The interquartile ranges were similar across all four models. The p-value for all Kruskal-Wallis rank-sum tests was less than 0.001, indicating significant differences in performance among the models. Pairwise comparisons were performed to determine these differences. All pairwise comparisons were considered statistically significant at a significance level of 0.05.
4. Discussion
This study aimed to compare the performance of four distinct CLMs in analyzing simulated and real-world datasets, namely the cloglog, cogit, loglog, and probit. The models were assessed based on their accuracy in classifying the ordinal outcomes. The findings consistently demonstrated that the logit model outperformed the other three CLMs across all datasets. The logit model achieved the highest percentage of correctly classified data in both the simulated and real-world datasets, with all pairwise comparisons between it and the other three models being statistically significantly different. These findings suggest that the logit model effectively captures the underlying relationships between the covariates and ordinal outcomes, leading to more accurate predictions. Although the logit model requires more iterations to reach its maximum accuracy level in the simulated datasets, its superior predictive performance often outweighs this computational cost. Though requiring the fewest iterations, the probit model consistently demonstrated lower accuracy than the logit model.
Several factors may contribute to the logit model’s superior performance and generally preferred biomedical research utilization:
1) Proportional odds assumption: The logit model assumes proportional odds ratios across categories. This assumption implies that the relationship between covariates and cumulative probabilities remains consistent across different outcome levels.
2) Broad adoption and interpretability: The logit model is widely adopted and interpretable in biomedical research and provides a familiar framework for researchers.
3) Applicability to diverse datasets: The consistently high performance of the logit model across various covariate scenarios in the simulated and real-world genomic data indicates its robustness and generalizability.
The findings of this study have significant implications for biomedical research. For model selection, the results provide strong evidence supporting the logit model as the primary choice when analyzing data with ordinal outcomes. Its robust performance across diverse datasets makes it a reliable and effective tool for predicting ordinal outcomes in biomedical research. This study underscores the importance of carefully considering the CLMs’ distributional assumptions. The selection of an appropriate link function, such as a logit link, can significantly impact the accuracy and interpretability of the results. Ordinal neural networks represent an innovative and robust methodology for analyzing high-dimensional health data characterized by ordinal outcomes. Researchers can significantly improve their analytical capabilities by integrating these networks with various activation functions, each tied to a different statistical link function.
Combining ordinal neural networks and cumulative link activation functions equips researchers with tools to analyze health data with ordinal outcomes effectively. Future research should explore the following.
1) Performance of other activation functions: This study focused on four prominent cumulative link functions. Further investigation of other activation functions may reveal additional insights and identify alternative functions that outperform the logit model in specific contexts.
2) Impact of model architecture: The neural network architecture, including the number of hidden layers and nodes, can influence model performance. Exploring different architectures and delving into deep learning can further enhance the predictive accuracy of ordinal neural networks.
3) Applications to other biomedical datasets: Validating this study’s findings on other biomedical datasets with ordinal outcomes is crucial for establishing the generalizability of the observed patterns and confirming the CLMs’ performance.
Even though the logit cumulative link function outperforms the probit, cloglog, and loglog functions, these three functions still have merit. They can be employed in sensitivity analysis to confirm the assumption of the latent variable coming from the logistic distribution. The logit link is the most widely used option for ordinal regression in health sciences [36], providing interpretable regression coefficients regarding odds ratios. However, some researchers may prefer the log link, which yields coefficients that are functions of probabilities rather than odds [36].
Ultimately, the appropriate link function selection should be determined by the research objectives, distributional assumptions, dataset characteristics, and interpretability of the resultant findings. In the probit link, covariates were not easily interpretable. However, aligning the latent variable to a well-known distribution benefits the probit model. The probit link assumes a normal latent distribution to interpret predictor changes. The cloglog link associates coefficients with changes in the complement of the log of the negative log of cumulative probability, which is less straightforward. By contrast, the logit link associates coefficients with log-odds changes in the outcome for a unit change in the predictor [12], a concept most biomedical researchers understand. The loglog link represents regression coefficients as changes in the log of the negative log of cumulative probability for ordinal outcomes, holding other variables constant [36] [37], differing from the log odds interpretation of the logit link.
This study faces several limitations. The neural network architecture, featuring one hidden layer and six nodes, affects performance; different architectures and deep learning techniques could enhance accuracy. In addition, only one real-world genomic dataset and one simulated dataset were used, highlighting the need for further validation with other biomedical datasets to improve generalizability. Although the logit model performs well, it requires more iterations, indicating a trade-off between accuracy and computational cost. The logit model assumes proportional odds, which may not always hold, thus limiting its appropriateness; other models can be tested when this assumption is violated.
This study contributes to the growing literature on the application of ordinal neural networks to health-related data analysis. Demonstrating the advantages of employing CLM-based neural networks provides a framework for biomedical researchers to improve the accuracy of predictive models. Future research should explore the evaluation of additional CLMs and compare their performance using the logit link.
5. Conclusion
In conclusion, this study demonstrated that the logit link consistently outperforms other CLMs when analyzing simulated and real-world health data with ordinal outcomes. The findings highlight the importance of careful model selection and consideration of distributional assumptions in ordinal outcome modeling. The study’s insights contribute to model selection considerations when analyzing complex health data, leading to more accurate predictions and an improved understanding of biomedical phenomena.
Data Availability Statement
The dataset is called the hccframe and is located in the ordinalgmifs R package [27]. Data were obtained using the following R code:
>install.packages("ordinalgmifs")
>library(ordinalgmifs)
>data(hccframe).
Acknowledgements
The author expresses gratitude to the anonymous referees for their constructive feedback.
Funding
The author did not receive financial support for this article’s research, writing, or publication.