Using Deep Learning to Predict QT Prolongation in ICU Patients on Antipsychotic Therapy: A Case Study

Abstract

QT prolongation is a significant cardiac risk factor, particularly in ICU patients undergoing antipsychotic therapy, where it can lead to life-threatening arrhythmias such as torsades de pointes or sudden cardiac arrest. This study presents a robust deep-learning approach using Long Short-Term Memory (LSTM) networks to predict QT prolongation based on clinically relevant features, including demographics, electrolyte levels, heart rate, and medication types. A synthetic dataset mimicking ICU patient profiles was used for model training and evaluation, achieving high accuracy, precision, and recall. The model demonstrated strong discriminatory power, with an Area under the Curve (AUC) of 0.94, and effectively balanced sensitivity and specificity, ensuring reliable predictions for both prolonged and normal QT intervals. Performance metrics, including ROC and Precision-Recall curves, and a confusion matrix validated the model’s robustness and generalizability. Key strengths include minimal overfitting and adaptability for real-time ICU deployment. While the use of synthetic data provided a controlled environment for evaluation, validation on real-world datasets is necessary for clinical adoption. This study highlights the potential of integrating deep learning models into ICU workflows to enable early detection of QT prolongation, guide personalized treatment strategies, and improve patient outcomes.

Share and Cite:

de Filippis, R. and Al Foysal, A. (2025) Using Deep Learning to Predict QT Prolongation in ICU Patients on Antipsychotic Therapy: A Case Study. Open Access Library Journal, 12, 1-11. doi: 10.4236/oalib.1112748.

1. Introduction

QT prolongation, characterized by an abnormally extended interval in the heart’s electrical cycle, is a known risk factor for life-threatening arrhythmias such as torsades de pointes and sudden cardiac arrest [1]. This condition is particularly concerning in ICU patients receiving antipsychotic therapy for delirium, as these medications are commonly associated with QT interval prolongation [2]. The early detection of QT prolongation is critical in preventing adverse outcomes, yet current clinical methods for prediction often rely on static thresholds and lack the capacity for real-time, individualized risk stratification [3]. Advances in machine learning and deep learning offer new opportunities to address these challenges by enabling the development of dynamic, data-driven predictive models. Long Short-Term Memory (LSTM) networks, a type of Recurrent Neural Network (RNN), are well-suited for this task due to their ability to analyse sequential and temporal data, such as ECG measurements [4]. By integrating patient-specific variables—including demographics, electrolyte levels, heart rate, and antipsychotic medication types—LSTM models can offer a more nuanced approach to predicting QT prolongation risk [5]. This study focuses on the design, development, and evaluation of a robust LSTM-based model for QT prolongation prediction in ICU patients. Using a synthetic dataset that simulates real-world ICU conditions, the model was trained to identify QT prolongation while balancing sensitivity and specificity to ensure both high-risk cases are detected, and unnecessary interventions are minimized. The performance of the model was validated using multiple metrics, including accuracy, precision, recall, ROC curves, and Precision-Recall curves. This work not only demonstrates the feasibility of using deep learning in critical care settings, but also lays the groundwork for real-time integration of predictive models into ICU monitoring systems to enhance patient outcomes.

2. Methods

This study employed a comprehensive and robust methodology to develop and evaluate a deep-learning model for predicting QT prolongation in ICU patients undergoing antipsychotic therapy. The approach involved synthetic data generation, model design, and performance evaluation using state-of-the-art techniques.

2.1. Data Generation

A dataset of 5000 synthetic patient records was generated to mimic real-world ICU conditions, ensuring the inclusion of key clinical variables. Each record included demographic features (age and gender), clinical parameters (potassium and magnesium levels, heart rate, and baseline QT interval), and medication data categorized into three groups: no risk, low risk, and high risk of QT prolongation. The corrected QT interval (QTc) was calculated using Bazett’s formula (QTc = QT/sqrt (RR interval)) to normalize QT interval variations across different heart rates [6]. Cases were labelled as prolonged QT if the QTc exceeded 450 milliseconds. This dataset captured realistic clinical variability while controlling for key risk factors.

2.2. Model Development

The deep learning model leveraged a Long Short-Term Memory (LSTM) network, a specialized recurrent neural network architecture suitable for sequential data like ECG features [7]. The architecture consisted of:

  • Two LSTM layers with 128 and 64 units, respectively, for feature extraction and temporal dependency modelling.

  • Dropout layers (30%) and Batch Normalization to mitigate overfitting and stabilize training.

  • A Dense layer with 32 neurons for further feature abstraction, followed by a final sigmoid-activated layer for binary classification (prolonged vs. normal QT).

The model was compiled using the Adam optimizer with an initial learning rate of 0.001 and binary cross-entropy as the loss function. Metrics for evaluation included accuracy, precision, recall, and F1-score, ensuring a balanced assessment of the model’s performance.

2.3. Training and Optimization

The dataset was split into 80% training and 20% testing subsets. Features were standardized using z-score normalization to ensure uniform scaling across variables. Input data was reshaped to fit the LSTM architecture, where each sample was treated as a sequence with six features. To optimize training, a dynamic learning rate scheduler was implemented, reducing the learning rate exponentially after 10 epochs. The model was trained for 25 epochs with a batch size of 64. Early stopping was avoided to fully observe model convergence trends, which were tracked using training and validation loss and accuracy.

2.4. Evaluation Metrics and Visualization

Model performance was evaluated using a range of metrics and visualization tools:

  • A confusion matrix quantified the distribution of true positives, true negatives, false positives, and false negatives, offering insight into classification accuracy.

  • The ROC curve and its Area under the Curve (AUC) score assessed the model’s ability to discriminate between prolonged and normal QT intervals.

  • The Precision-Recall curve was used to evaluate the trade-off between precision and recall, especially valuable in imbalanced datasets.

Visualization of training and validation loss and accuracy trends provided a clear view of the model’s learning progression. Additionally, the confusion matrix, ROC curve, and Precision-Recall curve offered critical insights into the model’s robustness and real-world applicability.

This methodology ensured a systematic and rigorous approach to developing a clinically relevant deep learning model, capable of integrating with ICU monitoring systems for personalized risk stratification in real-time. The techniques used provide a strong foundation for further validation and deployment in clinical settings.

3. Results

The LSTM-based deep learning model demonstrated robust performance in predicting QT prolongation among ICU patients, as evidenced by multiple evaluation metrics and visualizations. The model’s learning progression, classification performance, and ability to differentiate between prolonged and normal QT intervals are detailed below.

3.1. Model Learning and Convergence

The model exhibited consistent improvements in both training and validation loss over the 25 epochs, indicating effective learning without overfitting. Figure 1 shows a steady decrease in loss, with validation loss closely following the training loss, reflecting generalizability to unseen data.

Similarly, the model’s accuracy steadily improved during training, as illustrated in Figure 2, where validation accuracy peaked at 88.5%, demonstrating the model’s ability to generalize to the test data.

Figure 1. Training vs. validation loss.

3.2. Classification Performance

The model’s classification results are summarized in a confusion matrix (Figure 3), highlighting the following:

  • True Positives (TPs): 739 cases of prolonged QT were correctly identified.

  • True Negatives (TNs): 147 normal QT cases were accurately classified.

Figure 2. Training vs. validation accuracy.

Figure 3. Confusion matrix.

  • False Positives (FPs): 58 cases were incorrectly predicted as prolonged QT.

  • False Negatives (FNs): 56 prolonged QT cases were missed.

This indicates a high sensitivity (recall of 93.8%) in detecting QT prolongation, critical for minimizing cardiac risks in ICU patients, while maintaining reasonable specificity.

3.3. Discriminatory Power

The Receiver Operating Characteristic (ROC) curve (Figure 4) revealed an Area under the Curve (AUC) of 0.94, indicating excellent model performance in distinguishing between prolonged and normal QT intervals. This high AUC value confirms the model’s robustness in classification, even when faced with overlapping distributions of input features.

3.4. Precision and Recall Trade-Off

To further evaluate the model’s reliability, a Precision-Recall (PR) curve was generated (Figure 5). The model maintained high precision across a wide range of recall values, indicating its ability to make accurate predictions without significantly compromising sensitivity. The near-perfect precision at higher recall levels highlights the model’s suitability for critical clinical applications where false negatives (missed QT prolongation cases) can have severe consequences.

3.5. Summary of Results

The combined analysis of loss trends, classification metrics, and discrimination curves demonstrates the model’s strong predictive capabilities. Key observations

Figure 4. ROC curve.

Figure 5. Precision-Recall curve.

include:

1) Low Training-Validation Gap: Minimal divergence between training and validation performance underscores the model’s ability to generalize.

2) Balanced Precision and Recall: High precision and recall across thresholds ensure reliable predictions for both positive (prolonged QT) and negative (normal QT) cases.

3) Excellent AUC: The AUC of 0.94 reflects the model’s strong discriminatory power.

4. Clarifications and Justifications

Validation of Synthetic Dataset for Clinical Realism: The synthetic dataset used in this study was designed to mimic real-world ICU conditions by integrating key clinical variables, including patient demographics, electrolyte levels (potassium and magnesium), heart rate, and medication profiles. This dataset was developed based on established ICU data distributions and validated by referencing prior studies and ICU patient profiles. However, the study acknowledges the limitations of relying solely on synthetic data. While this approach allowed for controlled evaluation and avoidance of data privacy issues, the authors recognize the need for real-world validation. The absence of real patient data for model training is identified as a limitation, with future work planned to integrate real ICU datasets to enhance model generalizability.

Rationale for Network Architecture: The model architecture, comprising two LSTM layers with 128 and 64 units and a Dense layer of 32 neurons, was selected to balance model complexity and performance. The two-tiered LSTM structure enables effective temporal feature extraction and sequential data analysis, crucial for handling time-series data such as ECGs. The Dense layer further abstracts key features, enhancing the model’s ability to distinguish between prolonged and normal QT intervals. The choice of 128 and 64 units was guided by empirical performance during preliminary experiments, though explicit hyperparameter tuning results are not presented in the main paper. Future iterations will incorporate ablation studies to provide a clearer justification for these architectural decisions.

Comparison with Benchmark Models: The study does not explicitly present comparisons with benchmark models or alternative architectures. From a practical standpoint, the inclusion of baseline models (such as logistic regression, random forests, or simpler recurrent networks) would provide valuable context for assessing the performance gains achieved by the LSTM approach. Future work will focus on incorporating such benchmarks to further validate the superiority of the proposed architecture.

Temporal Relevance of Features: Temporal dynamics play a significant role in QT prolongation prediction. Features such as heart rate and medication effects are time-sensitive, necessitating the use of sequential modelling. The study leverages LSTM networks precisely for their ability to capture such temporal dependencies, although an explicit time-series analysis of individual features is not detailed. Additional visualization of feature importance over time could further elucidate how specific variables influence predictions, enhancing interpretability for clinicians.

Feature Selection Rationale: Potassium and magnesium levels were selected as primary features due to their well-documented association with cardiac repolarization and QT interval variability. Electrolyte imbalances are known contributors to arrhythmias, making them essential predictors in QT prolongation models. Heart rate and medication profiles were similarly chosen based on clinical evidence linking these factors to QTc variations. This feature selection process reflects the clinical understanding of QT prolongation mechanisms, aligning with established risk factors in ICU patients.

Model Adaptability across ICU Environments: The adaptability of the model across different ICU settings remains a concern, given the use of synthetic data. To address this, the authors propose retraining the model on site-specific ICU datasets once real-world data becomes available. Transfer learning and domain adaptation techniques will be explored to fine-tune the model to diverse patient populations and institutional practices. This iterative approach aims to ensure the model’s robustness across varying ICU environments.

5. Discussion

This study demonstrates the successful application of a robust LSTM-based deep learning model for predicting QT prolongation in ICU patients undergoing antipsychotic therapy. The model’s high accuracy, precision, and recall underscore its potential for integration into ICU monitoring systems, enabling real-time risk stratification and decision-making in critical care settings. The model’s performance metrics reflect its robustness. The low training-validation gap indicates that the model generalizes well to unseen data, minimizing the risk of overfitting—a crucial aspect for real-world deployment. The ROC curve, with an AUC of 0.94, highlights the model’s strong discriminatory power, confirming its ability to distinguish prolonged QT intervals from normal ones with high reliability. Furthermore, the Precision-Recall curve demonstrates the model’s effectiveness in achieving a fine balance between precision and recall, an essential feature in clinical scenarios where false negatives (missed QT prolongation cases) could lead to severe cardiac complications. The confusion matrix provides a deeper insight into the classification performance. With high sensitivity (93.8%) and a reasonable level of specificity, the model is particularly adept at identifying true positive cases of QT prolongation. This is critical in ICU settings, where missing a case of QT prolongation could result in life-threatening arrhythmias [8]. However, the model’s specificity could be further optimized to reduce false positives, ensuring unnecessary interventions are minimized. One of the significant strengths of this study is the use of a synthetic dataset that closely mimics real-world ICU conditions. By incorporating key clinical features such as potassium and magnesium levels, heart rate, and medication type, the model captures the multifactorial nature of QT prolongation. However, the reliance on synthetic data also presents limitations. While the dataset ensures controlled variability, its applicability to diverse patient populations remains untested. Validation on real-world datasets is essential to confirm the model’s generalizability and clinical utility. From a clinical perspective, this model offers several implications. First, it supports early identification of patients at risk of QT prolongation, allowing clinicians to modify antipsychotic regimens or implement preventive measures. Second, the model’s ability to operate in real-time makes it a valuable addition to ICU monitoring systems, where timely interventions are critical. Lastly, the framework established in this study could be extended to other high-risk populations or additional cardiac risk factors, broadening its scope in predictive healthcare. Despite its promising results, this study has some limitations. While the model achieves high performance on synthetic data, its deployment in clinical practice requires further validation of real-world patient data. Additionally, QT prolongation thresholds (e.g. QTc > 450 ms) used in this study may vary depending on institutional or regional guidelines, necessitating customization for local clinical practice.

The proposed LSTM-based model demonstrates strong potential for improving the management of QT prolongation in ICU patients. By combining advanced machine learning techniques with clinically relevant features, this study provides a foundation for developing personalized, data-driven approaches in critical care. Future work should focus on real-world validation and integration into ICU workflows, ensuring widespread clinical adoption and impact.

6. Conclusions

This study highlights the successful implementation of a robust LSTM-based deep learning model for predicting QT prolongation in ICU patients undergoing antipsychotic therapy. By leveraging clinically relevant features such as patient demographics, electrolyte levels, heart rate, and medication types, the model demonstrated excellent performance with high accuracy, precision, recall, and a strong AUC of 0.94. These results emphasize the potential of deep learning techniques in addressing critical cardiac risks in high-acuity settings. The model’s ability to generalize effectively, as evidenced by the minimal training-validation performance gap, underscores its applicability to real-world scenarios. Furthermore, the balanced precision and recall values highlight its reliability in minimizing both false negatives, which could lead to missed cardiac risks, and false positives, which could cause unnecessary interventions. While the results on synthetic data are promising, real-world validation remains an essential next step to ensure the model’s applicability across diverse ICU populations. Its integration into ICU monitoring systems could provide clinicians with a powerful tool for real-time risk stratification and personalized decision-making, potentially improving patient outcomes and safety [9]-[11].

In conclusion, this study demonstrates the feasibility of employing advanced machine learning techniques to predict QT prolongation, offering a foundation for future innovations in critical care. With further refinement and validation, this approach has the potential to transform the management of cardiac risks, paving the way for more personalized and proactive healthcare delivery.

Conflicts of Interest

The authors declare no conflicts of interest.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Moss, A.J. (1986) Prolonged QT-Interval Syndromes. JAMA: The Journal of the American Medical Association, 256, 2985-2987.
https://doi.org/10.1001/jama.256.21.2985
[2] Stollings, J.L., Boncyk, C.S., Birdrow, C.I., Chen, W., Raman, R., Gupta, D.K., et al. (2024) Antipsychotics and the QTC Interval during Delirium in the Intensive Care Unit: A Secondary Analysis of a Randomized Clinical Trial. JAMA Network Open, 7, e2352034.
https://doi.org/10.1001/jamanetworkopen.2023.52034
[3] Monitillo, F. (2016) Ventricular Repolarization Measures for Arrhythmic Risk Stratification. World Journal of Cardiology, 8, 57-73.
https://doi.org/10.4330/wjc.v8.i1.57
[4] Sahoo, B.B., Jha, R., Singh, A. and Kumar, D. (2019) Long Short-Term Memory (LSTM) Recurrent Neural Network for Low-Flow Hydrological Time Series Forecasting. Acta Geophysica, 67, 1471-1481.
https://doi.org/10.1007/s11600-019-00330-1
[5] Cissoko, M.B.H. (2024) Adaptive Time-Aware LSTM for Predicting and Interpreting ICU Patient Trajectories from Irregular Data. Ph.D. Thesis, Université de Strasbourg.
[6] Rabkin, S.W. (2015) Nomenclature, Categorization and Usage of Formulae to Adjust QT Interval for Heart Rate. World Journal of Cardiology, 7, 315-325.
https://doi.org/10.4330/wjc.v7.i6.315
[7] Satheeswaran, V., Naga Chandrika, G., Mitra, A., Chowdhury, R., Kumar, P. and Glory, E. (2024) Deep Learning Based Classification of ECG Signals Using RNN and LSTM Mechanism. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6, 332-342.
[8] Santoro, F., Monitillo, F., Raimondo, P., Lopizzo, A., Brindicci, G., Gilio, M., et al. (2020) QTC Interval Prolongation and Life-Threatening Arrhythmias during Hospitalization in Patients with Coronavirus Disease 2019 (COVID-19): Results from a Multicenter Prospective Registry. Clinical Infectious Diseases, 73, e4031-e4038.
https://doi.org/10.1093/cid/ciaa1578
[9] Flohr, L., Beaudry, S., Johnson, K.T., West, N., Burns, C.M., Ansermino, J.M., et al. (2018) Clinician-Driven Design of Vitalpad—An Intelligent Monitoring and Communication Device to Improve Patient Safety in the Intensive Care Unit. IEEE Journal of Translational Engineering in Health and Medicine, 6, 1-14.
https://doi.org/10.1109/jtehm.2018.2812162
[10] Johnson, A.E.W., Ghassemi, M.M., Nemati, S., Niehaus, K.E., Clifton, D. and Clifford, G.D. (2016) Machine Learning and Decision Support in Critical Care. Proceedings of the IEEE, 104, 444-466.
https://doi.org/10.1109/jproc.2015.2501978
[11] Moorman, L.P. (2021) Principles for Real-World Implementation of Bedside Predictive Analytics Monitoring. Applied Clinical Informatics, 12, 888-896.
https://doi.org/10.1055/s-0041-1735183

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.