Enhanced Predictive Modelling for Delirium in Intensive Care Using Simplified Deep Learning Architecture with Attention Mechanism ()
1. Introduction
Delirium is an acute cognitive disorder frequently observed in ICU patients, characterized by sudden changes in attention, awareness, and cognition [1]. Its prevalence is particularly high in critical care settings, with reported rates of up to 80% among mechanically ventilated patients and around 30% - 50% among non-ventilated ICU patients [2]. Delirium presents a significant challenge for both patients and healthcare systems due to its association with increased morbidity, mortality, and prolonged ICU stays. Beyond immediate effects, delirium has lasting consequences, including a higher likelihood of long-term cognitive impairment, physical disability, and psychiatric symptoms, contributing to a decline in the quality-of-life post-discharge.
Despite the severe impacts of delirium, early detection remains challenging [3]. Current assessment methods, such as the Confusion Assessment Method for the ICU (CAM-ICU) and the Intensive Care Delirium Screening Checklist (ICDSC), rely heavily on subjective clinical judgment and are often administered intermittently, which may lead to delays in recognizing early warning signs [4]. These assessments are labour-intensive and may overlook subtle physiological indicators that precede the onset of delirium. In response to these limitations, the integration of artificial intelligence (AI) and machine learning (ML) in healthcare offers a promising pathway for continuous, objective delirium risk monitoring. Leveraging patient data through advanced predictive models can support clinicians in early identification and intervention, potentially improving patient outcomes and reducing healthcare costs associated with prolonged ICU stays [5].
This study aims to develop a robust, real-time predictive model for early detection of delirium in ICU patients by harnessing both static and dynamic patient data [6]. Static data, including demographics (e.g., age, comorbidities) and baseline health scores (e.g., APACHE-II), provide foundational information on patient risk factors. Dynamic data, consisting of time-series measurements of vital signs (e.g., heart rate, blood pressure, respiratory rate), offers insights into real-time physiological fluctuations that may signal an increased risk of delirium [7]. By combining these two data types, we aim to create a holistic model capable of capturing both pre-existing risk factors and acute changes in patient status.
The primary objective is to construct a model that can continuously monitor patients and provide real-time delirium risk assessments, empowering clinicians with the information needed to intervene proactively. This approach aligns with the growing trend toward precision medicine in critical care, where individualized and timely interventions are key to improving outcomes. The model is designed with practical application in mind, targeting ease of integration within ICU monitoring systems and compatibility with the complex, high-stakes environment of critical care.
The model proposed in this study integrates a simplified Long Short-Term Memory (LSTM) network with an attention mechanism, a novel approach in the context of ICU delirium prediction. The LSTM architecture is particularly well-suited for sequential data and is effective in capturing temporal dependencies within dynamic ICU measurements [8]. However, time-series models like LSTMs can struggle with interpretability, as they treat each time step equally, which can obscure which specific physiological changes are most relevant to delirium onset [9]. To address this limitation, we incorporate an attention mechanism that dynamically weights the importance of each time step, allowing the model to focus on the most relevant physiological patterns associated with delirium risk.
The attention-enhanced LSTM model not only improves interpretability but also offers computational efficiency by minimizing unnecessary complexity [10]. Unlike deep architectures with multiple stacked layers, which are often computationally intensive and may overfit limited ICU data, our simplified approach is designed for balance. It retains high predictive accuracy while reducing computational overhead, making it feasible for real-time application in an ICU setting. The model’s attention mechanism also enhances clinical usability by providing interpretable insights into which features, and time points contribute most to the risk prediction, potentially supporting clinicians in understanding and trusting AI-driven recommendations [11].
This study is among the first to apply an attention-enhanced LSTM network specifically to ICU delirium prediction, paving the way for future AI-driven decision support tools in critical care. By focusing on both methodological rigor and practical applicability, this research provides a foundation for the integration of AI-based risk assessment into routine ICU practice, where real-time and interpretable models have the potential to transform patient care and outcomes.
2. Methods
2.1. Data Generation
2.1.1. Synthetic Data Generation
Given limited access to real ICU patient data, we generated a synthetic dataset that emulates ICU scenarios, focusing on attributes associated with delirium risk. This approach enabled us to simulate a diverse patient population with variations in both static and dynamic features. The static features represent baseline characteristics that do not change over time, such as age, APACHE-II score, and comorbidities, which are commonly associated with a higher risk of delirium [12]. These features provide critical baseline data to the model, reflecting factors that predispose patients to delirium.
The dynamic features, representing time-series data (e.g., hourly vital signs such as heart rate, blood pressure, and respiratory rate), were simulated to capture real-time fluctuations in physiological metrics that could indicate an impending delirium episode. These time-dependent features allow the model to detect subtle physiological changes over time, simulating the real-world environment where ICU patients are continuously monitored.
2.1.2. Simulating Data Patterns for Delirium and Non-Delirium Cases
To create a robust training environment, we introduced distinct patterns into the synthetic data to differentiate delirium from non-delirium cases. For patients labelled as high risk for delirium, dynamic features were manipulated to exhibit specific trends indicative of early physiological changes, such as rising heart rate, blood pressure variability, and increased respiratory rate. These patterns were designed to simulate the subtle but progressive signs that may precede a clinical diagnosis of delirium, allowing the model to learn features characteristic of at-risk patients. Non-delirium cases were designed with more stable dynamic data, reflecting a baseline ICU patient who is unlikely to develop delirium. These patterns enabled the model to learn and differentiate physiological trajectories associated with delirium risk, making it a valuable tool for ICU risk stratification.
2.2. Model Architecture
The model architecture was designed with two main branches to process both static and dynamic data inputs, optimizing the model’s ability to capture the unique contributions of each data type.
2.2.1. Model Components
Static Input Branch: The static branch processes patient-specific features that remain constant over time. This branch includes a dense layer with ReLU (Rectified Linear Unit) activation, allowing the model to capture non-linear relationships between static variables and delirium risk [13]. By transforming static features into a lower-dimensional space, this branch provides a condensed representation of baseline factors that the model can integrate with time-dependent information from the dynamic branch [14].
Dynamic Input Branch with Attention: The dynamic branch uses an LSTM (Long Short-Term Memory) layer to handle the sequential, time-series data, which captures temporal dependencies within vital sign measurements. The LSTM is particularly well-suited for ICU data as it can retain critical information across multiple time steps, reflecting the evolution of a patient’s physiological status. However, traditional LSTMs process each time step with equal priority, which may not align with clinical relevance [15]. To enhance interpretability and improve predictive accuracy, we added an attention mechanism. The attention mechanism allows the model to assign greater importance to specific time steps or vital sign fluctuations that are more predictive of delirium, thus highlighting the most clinically relevant information [16]. This mechanism not only improves model performance but also makes it possible to interpret the temporal focus of the model, aligning predictions with the patient’s physiological progression.
Output Layer: The final output layer integrates the information from both branches to produce a risk score for delirium. A single neuron with a sigmoid activation function is used to generate a probability between 0 and 1, where values closer to 1 indicate a higher likelihood of delirium onset. This continuous output allows for a flexible risk threshold, which can be adjusted based on clinical needs, providing a binary classification of high- vs. low-risk patients.
2.2.2. Loss Function and Optimization
The model was compiled with binary cross-entropy loss, a suitable choice for binary classification tasks, particularly in healthcare contexts where false negatives can have serious implications [17]. Binary cross-entropy provides a gradient signal that penalizes large errors in probability predictions, guiding the model to make calibrated, accurate predictions [18]. Adam optimizer (Adaptive Moment Estimation) was used for training, as it dynamically adjusts learning rates based on gradients, making it effective in optimizing models with time-series data [19]. Adam’s adaptive nature ensures efficient convergence, allowing the model to balance static and dynamic data learning rates effectively [20] [21].
2.3. Training and Evaluation
2.3.1. Handling Class Imbalance
In most ICU datasets, there is an inherent imbalance between delirium and non-delirium cases, which is reflected in our synthetic dataset with a 70-30 ratio. To address this, class weights were introduced during training, giving higher importance to the minority class (delirium cases). By weighting the minority class more heavily, the model learns to prioritize recall for delirium, minimizing the likelihood of false negatives. This approach is critical for a clinical setting where failing to identify high-risk patients could delay crucial interventions.
2.3.2. Performance Metrics
The model’s effectiveness was evaluated using a comprehensive set of metrics.
Accuracy: While accuracy provides an overall assessment, it can be misleading in imbalanced datasets [22]. Thus, it was supplemented with additional metrics focused on the minority class.
AUC-ROC (Area Under the Receiver Operating Characteristic Curve): The ROC curve plots the true positive rate (sensitivity) against the false positive rate, providing a threshold-independent measure of model performance [23]. A high AUC-ROC indicates that the model can effectively differentiate between delirium and non-delirium cases across various thresholds, which is important in clinical applications where the risk threshold may vary based on context [24].
AUC-PR (Area Under the Precision-Recall Curve): Given the imbalanced nature of the dataset, AUC-PR was particularly relevant, as it focuses on the positive class (delirium). The PR curve evaluates the balance between precision (true positives among predicted positives) and recall, reflecting the model’s ability to correctly identify high-risk patients without overpredicting delirium [25].
Recall and F1-Score: Emphasis was placed on recall (sensitivity) for the delirium class, as it is crucial to capture as many true positives as possible in a healthcare context. The F1-score was also calculated, as it provides a harmonic mean of precision and recall, offering a balanced metric that considers both false positives and false negatives [26].
2.4. Early Stopping
To prevent overfitting, an early stopping mechanism was implemented with a patience setting of five epochs [27]. Early stopping continuously monitors the validation loss during training, halting the process if the model’s performance plateaus or degrades on the validation set [28] [29]. This approach helps the model generalize well to new data, avoiding overfitting to specific patterns in the synthetic dataset. The patience setting provides a buffer to allow the model to improve before stopping, ensuring that it reaches a stable minimum in loss without memorizing the data.
3. Results
3.1. Model Performance
Training and Validation Accuracy
The model demonstrated outstanding predictive performance, achieving near-perfect accuracy on both the training and validation datasets. By the end of the training process, the model achieved 100% accuracy for both sets, reflecting its capacity to effectively capture the underlying patterns in the synthetic dataset without overfitting. This high accuracy is indicative of the model’s ability to generalize well across data splits, even with complex time-series and static features combined.
Figure 1. “Training and validation performance over epochs: accuracy (left) and loss (right)”.
Figure 1 (left) shows the Training and Validation Accuracy curve, with both training and validation accuracy rapidly increasing and stabilizing at 100%. The synchronized convergence of training and validation accuracy lines demonstrates the model’s ability to generalize effectively to unseen data [30]-[32]. Meanwhile, Figure 1 (right) illustrates the Training and Validation Loss curve, where the loss decreases sharply in the initial epochs, stabilizing near zero. This rapid loss reduction highlights the model’s efficiency and the effectiveness of the Adam optimizer in fine-tuning weights [33]. The minimal gap between training and validation loss curves further indicates the absence of overfitting, which is crucial for real-world applications [34]-[36].
Loss Reduction: The model exhibited rapid convergence during training, with the loss reducing significantly within the first few epochs and stabilizing at minimal values thereafter [37] [38]. The synchronized reduction of both training and validation loss (Figure 1, right) suggests that the model learned the data patterns efficiently and reached an optimal solution with minimal computational overhead [39]. This rapid convergence is particularly beneficial in a clinical setting, where computational efficiency is important for real-time application.
AUC Metrics: The model’s evaluation of the ROC and Precision-Recall (PR) curves yielded an AUC of 1.00 for both metrics, reflecting its exceptional discriminative capability. The AUC-ROC metric indicates that the model can reliably distinguish between delirium and non-delirium cases across various thresholds, ensuring that it maintains a high true positive rate (sensitivity) while minimizing false positives. Similarly, the PR curve, which is especially informative in imbalanced datasets, shows that the model achieves a perfect balance between precision (the proportion of true positives among predicted positives) and recall (the proportion of true positives identified out of all actual positives).
Figure 2. “Model evaluation metrics: ROC curve (left) and precision-recall curve (right)”.
Figure 2 (left) illustrates the ROC Curve, where the curve reaches the upper-left corner, signifying a perfect trade-off between sensitivity and specificity. This visual underscores the model’s high degree of separability between classes, confirming its reliability in distinguishing between high-risk and low-risk cases. Figure 2 (right) shows the Precision-Recall Curve, where precision and recall values are consistently high across thresholds, further reinforcing the model’s ability to accurately detect delirium cases without compromising on precision. Both metrics’ perfect AUC scores (1.00) validate the model’s robustness and efficacy in this synthetic ICU dataset.
3.2. Visualization
Delirium Risk Over Time: A critical aspect of this study involved tracking delirium risk predictions over time to assess the model’s ability to detect early signs of delirium. For each patient, the model generated a continuous probability score that could be monitored over time, enabling clinicians to identify potential high-risk cases before a clinical diagnosis of delirium would typically occur.
Figure 3 provides a visualization of delirium risk predictions over time for a set of representative patients. The figure includes line plots for 20 patients, with 10 delirium and 10 non-delirium cases. High-risk (delirium) cases are shown with dashed lines, while low-risk (non-delirium) cases are displayed with solid lines. Additionally, a horizontal red dotted line at the 0.5 probability threshold demarcates the cutoff for high risk. In the figure, delirium cases demonstrate a clear upward trend, with risk probabilities exceeding the 0.5 threshold well before the simulation’s endpoint, highlighting the model’s ability to identify high-risk patients early [40]. Conversely, non-delirium cases maintain probabilities well below the threshold, indicating stable physiological profiles. This visualization underscores the model’s potential for real-time risk stratification, providing ICU teams with a reliable, early indicator of delirium onset.
![]()
Figure 3. “Delirium risk predictions over time with attention mechanism”.
Training Curves: The training curves for accuracy and loss offer insights into the model’s learning process and stability. Figure 1 provides these training and validation curves over epochs, illustrating both the rapid improvement and subsequent stabilization of model performance. The alignment of training and validation accuracy curves (Figure 1, left) and the convergence of loss values (Figure 1, right) further underscore the model’s robustness, indicating successful learning without overfitting or performance degradation.
Additional Notes on Robustness and Interpretability
Table 1 presents a summary of key model performance metrics, including accuracy, loss, and AUC values for both training and validation datasets. This table serves as a concise overview of the model’s performance, demonstrating near-perfect scores across all metrics. The table provides additional context for the figures, allowing readers to quickly gauge the model’s efficacy.
Additionally, a detailed classification report can be added as Table 2 to showcase precision, recall, and F1-scores for each class, particularly focusing on the delirium class where accurate identification is critical.
These tables complement the figures by quantifying the model’s classification performance and emphasizing its capability to handle imbalanced data effectively.
Table 1. Model performance metrics on training and validation datasets.
Metric |
Training set |
Training set |
Accuracy |
100.0% |
100.0% |
Loss |
0.01 |
0.01 |
ROC-AUC |
1.00 |
1.00 |
PR-AUC |
1.00 |
1.00 |
Table 2. Classification report for delirium and non-delirium predictions.
Class |
Precision |
Recall |
F1-score |
Support |
Non-delirium (0) |
1.00 |
1.00 |
1.00 |
679 |
Delirium (1) |
1.00 |
1.00 |
1.00 |
321 |
Weighted avg |
1.00 |
1.00 |
1.00 |
1000 |
4. Discussion
4.1. Interpretation of Results
The results of this study indicate that the proposed deep learning model demonstrates exceptional performance in predicting delirium risk among ICU patients within the synthetic data environment. The model achieved perfect scores across various metrics, including AUC-ROC and AUC-PR, suggesting a high degree of separability between delirium and non-delirium cases. This performance can be largely attributed to the model’s ability to effectively process both static and dynamic features, leveraging the unique strengths of each data type. The static features provided a solid foundation of baseline patient risk factors, while the dynamic features allowed the model to identify evolving physiological changes, which are critical in anticipating the onset of delirium.
A key factor contributing to this high performance is the attention mechanism integrated into the LSTM layer [41]. By focusing on significant temporal patterns within the time-series data, the attention mechanism enhances the model’s interpretability and allows it to prioritize clinically relevant time steps. In a practical sense, this means that the model can “pay attention” to periods of notable physiological fluctuations, which are often indicative of a patient’s transition into a high-risk state [42]. This aligns well with the clinical understanding of delirium as a dynamic condition, characterized by subtle but progressively worsening signs that could precede the onset of delirium. The attention mechanism likely played a crucial role in the model’s ability to accurately capture these patterns, thereby increasing its predictive reliability.
4.2. Limitations of Synthetic Data
Despite these promising results, the reliance on synthetic data presents notable limitations. Synthetic data, while valuable for initial testing and model development, inherently lacks the complexity and variability present in real-world ICU data. In a clinical setting, patient data often contain nuances that are challenging to replicate synthetically, such as variations in sensor accuracy, inconsistencies in recording frequencies, and the influence of multiple concurrent medical conditions. For example, real ICU patients may exhibit overlapping symptoms from multiple conditions, making it difficult to isolate physiological patterns associated solely with delirium [43]-[45]. Additionally, synthetic data do not fully capture the effect of interventions, medications, and other contextual factors that significantly influence patient physiology in real ICU settings.
Furthermore, while synthetic data can mimic general trends, it may fail to replicate specific idiosyncrasies of real physiological signals that clinicians observe in practice [46]-[48]. The limited complexity of synthetic data could therefore lead to over-optimization of the model, meaning that while the model performs exceptionally well on simulated data, its effectiveness might decrease when exposed to the variability and noise of real clinical data [49]-[51]. This limitation highlights the need for cautious interpretation of the results, as synthetic data, by its nature, cannot fully encapsulate the unpredictable, often noisy conditions of actual ICU environments.
4.3. Potential for Real-World Application
While the model’s high performance in a synthetic environment provides proof-of-concept for using deep learning to predict delirium in ICU patients, real-world validation is essential for assessing its clinical applicability. To transition from simulation to practical use, future studies must incorporate actual patient data from ICU databases, such as MIMIC-III or eICU-CRD, to validate the model’s efficacy under real-world conditions [52]-[54]. Real-world testing would provide insight into how well the model generalizes to diverse patient populations, including those with multiple comorbidities, varying demographics, and differing treatment protocols.
In addition to evaluating predictive accuracy, real-world applications should focus on interpretability and usability for clinical teams [55]. The attention mechanism embedded in the model provides a foundation for interpretable predictions by highlighting critical time periods, but further interpretability tools could be integrated to ensure that clinicians understand how and why the model arrives at its predictions. Clinicians’ trust in AI-driven tools hinges on transparent and interpretable outputs, as ICU settings are high-stakes environments where predictive errors could have severe consequences [56]-[59]. Future iterations of the model could incorporate additional interpretability techniques, such as feature attribution scores or visual dashboards, that make model predictions more actionable for ICU staff [60].
Ethical and Practical Considerations must also be addressed. Implementing such a model in ICU settings would require regulatory approval, robust data privacy measures, and integration with existing hospital information systems [61] [62]. Additionally, continuous monitoring and periodic retraining on updated data would be essential to maintain the model’s performance, as ICU data can change over time due to new treatment protocols, equipment, and patient demographics [63]. Collaboration with healthcare providers to establish standardized protocols for model deployment, monitoring, and retraining could facilitate the transition from research to real-world use.
4.4. Future Directions
To advance this research, several avenues should be explored. First, collecting diverse ICU data from multiple hospitals and geographic regions would allow the model to learn from a broader patient population, enhancing its generalizability and robustness. Such diversity would also expose the model to a wider range of medical conditions and treatment variations, thereby improving its adaptability to different ICU settings. Second, expanding the feature set to include additional vital signs, laboratory results, and treatment interventions could improve the model’s ability to predict delirium by capturing more comprehensive patient information [64]. Real-world ICU data often include a wealth of clinically relevant data points, such as oxygen saturation, blood pH levels, and medication dosages, which could provide additional predictive power and help the model account for a wider array of risk factors. Third, longitudinal studies assessing the model’s performance over extended periods would provide insight into its long-term reliability and its ability to adapt to shifts in ICU protocols or patient demographics [65]. Longitudinal validation would also enable researchers to track the impact of model-guided interventions on patient outcomes, such as reductions in delirium incidence or improved recovery times [66]. Lastly, the model could be enhanced by incorporating ensemble techniques or hybrid models combining LSTM with other architectures, such as convolutional neural networks (CNNs), to process multimodal data [67]. Hybrid models could, for example, process both time-series data and medical imaging (e.g., brain scans) for a more comprehensive assessment of delirium risk, potentially capturing risk factors that are undetectable in time-series data alone [68] [69].
5. Conclusions
This study introduces an innovative approach to early delirium prediction in ICU patients by leveraging a deep learning model that integrates a simplified LSTM architecture with an attention mechanism. By combining static patient characteristics with dynamic, time-series physiological data, the model achieved near-perfect predictive performance on a synthetic dataset, distinguishing between delirium and non-delirium cases with remarkable accuracy. This outcome underscores the potential of deep learning to improve risk stratification in critical care settings, providing clinicians with a reliable tool for early identification of high-risk patients. The model’s attention mechanism was particularly instrumental, allowing it to prioritize clinically relevant time points in the dynamic data. This feature not only enhances the model’s interpretability but also aligns with the clinical observation that delirium often develops as a gradual physiological change. By focusing on critical temporal patterns, the model has demonstrated the capability to detect early signs of delirium, potentially enabling timely interventions that could mitigate the adverse outcomes associated with this condition. However, despite the promising results, it is important to acknowledge the limitations associated with synthetic data [70]. While synthetic datasets allow for initial testing and model refinement, they cannot fully capture the variability, noise, and complexity inherent in real-world ICU data [71]. The true test of this model’s effectiveness lies in its application to clinical datasets, where diverse patient populations, comorbid conditions, and treatment interventions introduce additional layers of complexity. For the model to be clinically viable, it must demonstrate consistent performance across varied and often imperfect real-world data sources [72]. Future work will focus on addressing these challenges by testing the model on real ICU data from sources like MIMIC-III or eICU-CRD databases. These datasets will provide a more rigorous environment to evaluate the model’s robustness, adaptability, and generalizability. Additionally, we aim to explore methods for handling noisy, incomplete, or sparse data, which are common in healthcare settings. Techniques such as data augmentation, imputation, and transfer learning could enhance the model’s resilience to these challenges, enabling it to maintain predictive accuracy in the face of real-world data limitations.
In conclusion, this study presents a significant step toward using machine learning for proactive delirium management in critical care environments. With further validation and refinement, this model has the potential to become an asset in ICU settings, aiding clinicians in making timely, data-driven decisions to improve patient outcomes. Through continued research and collaboration with healthcare providers, we aim to bring this model closer to practical implementation, where it could contribute to enhancing patient safety and quality of care in intensive care units.
This template, created in MS Word 2007, provides authors with most of the formatting specifications needed for preparing electronic versions of their papers. All standard paper components have been specified for three reasons: 1) ease of use when formatting individual papers, 2) automatic compliance to electronic requirements that facilitate the concurrent or later production of electronic products, and 3) conformity of style throughout a journal paper. Margins, column widths, line spacing, and type styles are built-in; examples of the type styles are provided throughout this document and are identified in italic type, within parentheses, following the example. Some components, such as multi-leveled equations, graphics, and tables are not prescribed, although the various table text styles are provided. The formatter will need to create these components, incorporating the applicable criteria that follow.
Conflicts of Interest
The authors declare no conflicts of interest.