Using Machine Learning Model to Predict Anxiety in Systemic Lupus Erythematosus Patients ()
1. Introduction
Systemic Lupus Erythematosus (SLE) is a chronic autoimmune disease that can affect various organs of the body, such as the skin, articulations, kidneys, heart and brain. Symptoms of SLE can vary considerably from person to person, ranging from mild to severe, and can have a significant impact on patient’s quality of life [1].
Although lupus is considered to be a relatively uncommon disease, it cannot be classified as a rare condition. Worldwide, the incidence of systemic lupus erythematosus (SLE) is estimated at 5.14 cases per 100,000 person-years, representing around 0.40 million new diagnoses each year. Women are particularly affected, with an incidence of 8.82 cases per 100,000 person-years and around 0.34 million new cases annually, compared with 1.53 cases per 100,000 person-years in men, or 0.06 million cases. The worldwide prevalence of SLE is estimated at 43.7 cases per 100,000 people, affecting around 3.41 million people, of whom 3.04 million are women. Variation in the incidence and prevalence of lupus is also observed across regions, with the highest rates reported in Poland, the United States and Barbados [2].
Several complications can be present in lupus erythematosus patients, they can be organic, or psychiatric like depression and anxiety with an important impact on the quality of life [3], In Morocco, the prevalence of anxiety in SLE patients is estimated at 55.4% (CI 95%: 45.8% - 65%) [4]. In order to improve the quality of life of these patients, It is therefore essential to be able to predict and anticipate the development of anxiety in SLE patients. Such prediction would improve patient management and help prevent this serious complication.
From this perspective, the use of Machine Learning is proving to be a promising approach for predicting anxiety in SLE patients. In fact, machine learning has revolutionized the medical field, offering advanced tools for predicting diseases and their complications [5]. Thanks to machine learning algorithms, it is possible to analyze large quantities of patient data, in order to identify patterns and predict the occurrence or evolution of a disease [6]. These predictive models enable the early detection of disease, the development of personalized treatment plans and the prevention of potential complications. By integrating machine learning into the medical field, healthcare delivery is transformed and patient care is improved. In the specific case of systemic lupus erythematosus, the use of machine learning to predict anxiety in patients is showing promise. Algorithms could accurately predict the development of anxiety in patients suffering from this disease. At present, there are no published studies on the prediction of anxiety in SLE patients.
The objective of our study is to develop a machine learning algorithm able to predict anxiety in SLE patients.
2. Material and Method
A cross-sectional study was conducted in the internal medicine department of the Hassan II University Hospital in Fez in 2023.
All patients aged 18 and over, diagnosed with SLE according to ACR criteria [7], and admitted to the internal medicine department at CHU HASSAN II FES between 2020 and 2023 were included in the study.
Patients with a history of depression or anxiety, patients with psychiatric disorders including schizophrenia or bipolar states, and patients with comprehension or communication difficulties were excluded.
2.1. Data Collection
All data were collected during face-to-face interviews using a questionnaire containing information relating to their socio-demographic characteristics (age, gender, marital status, education level and employment status) and their past history. Information on disease characteristics, such as duration, different manifestations, autoantibody status (anti-DNA, anti-nuclear) and types of treatment used, was obtained by interviewing patients and examining their medical records. Anxiety was measured using the Moroccan version of the Hospital anxiety and depression scale (HADS) [8].
2.1.1. Anxiety Measure (HADS Scale)
The dependent variable is the level of anxiety measured by the Hospital Anxiety and Depression Scale (HADS).
The HADS was developed by Zigmond and Snaith in 1983 to screen non-psychiatric inpatients for anxiety disorders and depressive syndromes, but was later validated for outpatient use. It is a self-assessment scale that identifies anxiety and depressive disorders. It comprises 14 items rated from 0 to 3. About 7 questions are related to anxiety (total A) and 7 others concern the depressive dimension (total D). For each item, the response is rated from 0 to 3 on a scale depending on the intensity of the symptom over the past week. The range of possible scores for each subscale thus extends from 0 to 21, with higher scores corresponding to the presence of more severe symptoms. For each subscale (anxiety and depression), threshold values have been determined: a score between 0 and 7 is considered normal, while a score of 8 or more indicates a significant disorder [9] [10].
2.1.2. Explanatory Variables
The explanatory variables studied are essentially sociodemographic characteristics, clinical features of lupus (duration of disease, severity, etc.), and types of treatment used. Table 1 shows the description of these attributes and their values.
2.2. Data Pre-Processing
Microsoft Excel was used in this study for data entry and organization. Python (version 3.6.4), preprocessing and prediction algorithm development. The main software package used for the project was scikit-learn in Jupyter notebook.
Table 1. Characteristics of the sample.
Variables |
classes |
Total Eff. (%) |
Anxiety |
P-value |
Yes Eff. (%) |
No Eff. (%) |
Total |
|
|
57 (55.9) |
45 (44.1) |
|
Sex |
|
|
|
|
|
|
Men |
8 (07.2) |
3 (37.5) |
5 (62.5) |
0.300 |
|
Women |
94 (92.8) |
54 (57.4) |
40 (42.6) |
Age (year) mean ± SD |
41.64 ± 13.75 |
45.65 ± 13.73 |
36.56 ± 12.11 |
0.010 |
Statut matrimonial |
|
|
|
|
|
Single |
27 (26.5) |
8 (26.6) |
19 (70.4) |
0.006 |
|
Divorced |
14 (13.7) |
9 (64.3) |
5 (35.7) |
|
Married |
61 (59.8) |
40 (65.6) |
21 (34.4) |
Monthly Income |
|
|
|
|
|
<2000 DH |
70 (68.6) |
43 (61.4) |
27 (38.6) |
0.100 |
|
>2000 DH |
32 (31.4) |
14 (43.8) |
18 (56.3) |
Geographical origin (populated areas) |
|
|
|
|
Rural |
32 (31.4) |
16 (50.0) |
16 (50.0) |
0.42 |
|
Urbain |
70 (68.6) |
41 (58.6) |
29 (41.4) |
Number of previous medical conditions |
|
|
|
|
0 |
33 (32.4) |
14 (42.4) |
19 (57.6) |
0.037 |
|
1 |
35 (34.3) |
17 (48.6) |
18 (51.4) |
|
3 |
24 (23.5) |
17 (70.8) |
7 (29.2) |
|
4 |
8 (7.8) |
7 (87.5) |
1 (12.5) |
|
5 |
2 (2.0) |
2 (100) |
0 (0.0) |
Dermatological manifestations |
|
|
|
|
Yes |
30 (29.4) |
16 (53.3) |
14 (46.7) |
0.740 |
|
No |
70 (70.6) |
41 (56.9) |
31 (43.1) |
Rheumatological manifestations |
|
|
|
|
No |
36 (35.29) |
18 (50.0) |
18 (50.0) |
0.380 |
|
Yes |
66 (64.71) |
39 (59.1) |
27 (40.9) |
Neuropsychological manifestations |
|
|
|
|
No |
92 (90.2) |
50 (54.3) |
42 (45.7) |
0.510 |
|
Yes |
10 (09.8) |
7 (70.0) |
3 (30.0) |
Respiratory symptoms |
|
|
|
|
Continued
|
No |
81 (79.4) |
41 (50.6) |
40 (49.4) |
0.035 |
|
Yes |
21 (20.6) |
16 (76.2) |
5 (23.8) |
Ophthalmic manifestations |
|
|
|
|
|
No |
77 (75.5) |
42 (54.5) |
35 (45.5) |
0.630 |
|
Yes |
25 (24.5) |
15 (60.0) |
10 (40.0) |
Facial damage |
|
|
|
|
|
No |
53 (52.0) |
28 (52.8) |
25 (47.2) |
0.520 |
|
Yes |
49 (48.0) |
29 (59.2) |
20 (40.8) |
hematologic diseases |
|
|
|
|
|
No |
74 (72.5) |
43 (58.1) |
31 (41.9) |
0.460 |
|
Yes |
28 (27.5) |
14 (50.0) |
14 (50.0) |
Quantitative data were normalized using sklearn.preprocessing’s StandardScaler function. Categorical variables were encoded using the LabelEncoder function in the sklearn.preprocessing module.
There were no missing values or outliers in our database.
In this study, two feature selection techniques were used: Recursive Feature Elimination with Cross-Validation (RFECV) and Random Forest Importance.
The Sklearn train_test_split function was used to randomly divide the data into two subsets: a training set (80%) and a test set (20%).
2.3. Model Developement
2.3.1. Algorithms Used
In our analysis, we employed four well-established supervised machine learning algorithms: Support Vector Machine (SVM), which is effective for high-dimensional spaces and works well for both linear and non-linear classification by finding the optimal hyperplane that separates different classes; Logistic Regression, a statistical method used for binary classification that predicts the probability of an outcome based on input features, making it interpretable and efficient for large datasets; Decision Tree, a versatile model that splits data into subsets based on feature values, making it easy to visualize and understand the decision-making process while capturing non-linear relationships; and Random Forest, an ensemble method that combines multiple decision trees to enhance predictive accuracy and control overfitting, leveraging the wisdom of the crowd to produce more robust and reliable predictions.
2.3.2. Performance Evaluation
Several measures were used to evaluate the performance of the machine learning algorithms in this study, namely accuracy, precision, recall and F-score. These metrics serve as standard measures for evaluating the effectiveness of classification models and are defined by the following equations: Accuracy (Number of correct predictions/Total number of predictions), Precision (TP/(TP + FP)), Recall (TP/(TP + FN)), and F1-score (2*(Precision*Recall)/(Precision + Recall)).
2.4. Ethical Considerations
Participants’ anonymity and confidentiality were guaranteed. All participants gave written informed consent, and the study was approved by the Ethics Committee of the Hassan II University Hospital in Fez.
3. Results
In this study, a significant positive correlation was found between anxiety and several variables, including age, age at the time of diagnosis, duration of symptom progression, marital status, number of pre-existing conditions, and respiratory manifestations. Concurrently, a high negative correlation was observed between anxiety and educational level, sex, occupation, and monthly income. as shown in Figure 1.
Figure 1. Correlation analysis between anxiety and socioeconomic factors.
Table 2. Comparison of model performance metrics: Accuracy, Precision, Recall, and F1 Score.
Method |
Precision |
Recall |
F1-score |
Accuracy |
SVM |
0.54 |
0.70 |
0.61 |
0.57 |
logistic Regression |
0.60 |
0.9 |
0.72 |
0.67 |
Decision tree |
0.60 |
0.60 |
0.60 |
0.62 |
Random forest |
0.56 |
0.50 |
0.53 |
0.57 |
The model performance evaluation reveal variable measures of accuracy, ranging from 0.57 for random forest to 0.67 for logistic regression. In terms of precision, values ranged from 0.54 for SVM to 0.60 for logistic regression and decision tree. Recall ranged from 0.50 for random forest to 0.90 for logistic regression, while F1 score oscillated between 0.53 for random forest and 0.72 for logistic regression. as shown in Table 2.
4. Discussion
The aim of this work is to predict anxiety in lupus patients using machine learning, which could have a significant impact on the management of this autoimmune disease. The results of our study revealed important information about the performance of the different machine learning models used.
Anxiety was present in 55.9% of patients studied, indicating that it is a common problem in people with lupus. This underlines the importance of detecting and predicting this condition in these patients, so that appropriate interventions can be put in place to help them.
The results of the model performance evaluation showed accuracy rates ranging from 0.57 for random forest to 0.67 for logistic regression. These results suggest that logistic regression was the best-performing model in this study. These results demonstrate that machine learning models can be successfully used to predict anxiety in lupus patients.
It is interesting to compare these results with other similar studies that have looked at the prediction or detection of anxiety in the general population or in populations with other chronic illnesses. For example, a study on the prediction of anxiety in cancer survivors [11] showed accuracy rates ranging from 0.69 to 0.70 for different prediction models. These results are not far from those of the current study. Other studies have found accuracy values between 0.72 and 0.85 [12] and between 0.71 and 0.97 [13],
Furthermore, the accuracy values in this study ranged from 0.57 to 0.66, indicating a certain stability in the results obtained. However, it is important to note that recall rates varied between 0.50 and 0.90 and F1 scores between 0.53 and 0.72. These values show some fluctuation in model performance, which may be attributed to the complexity of predicting anxiety in these patients.
The results of Priya et al. [14] show variable model performance, with Random Forest achieving 71.4% accuracy for anxiety, which is lower than our best performance with logistic regression. This may indicate that the characteristics of the data and the study setting have a significant impact on the performance of the models.
On the other hand, the results of Wei et al. [15] suggest that the Random Forest and Multilayer Perceptron models have similar net benefits, highlighting the need to test multiple approaches for optimal results. The performance of Random Forest in our study could be improved by better feature selection, as suggested by the research of Zhou et al. [16] who achieved a high F1 score with a Random Forest model in an emotional classification framework.
Finally, our results highlight the complexity of predicting anxiety, as shown by the analysis of Mahalingam et al. [17] where several algorithms were tested to predict anxiety, with different results depending on the algorithm used. This highlights the importance of choosing the right model according to the specificities of the data and the clinical characteristics of the populations studied.
In this study, data collection was carried out in an internal medicine department of the CHU, which is recognized as a reference center for the management of lupus in the region. This feature suggests that the sample used in the study is representative of the lupus patient, which reinforces the reliability and generalizability of the results obtained.
In addition, a validated and reliable scale was used to measure anxiety in patients. This ensures that the measures collected are accurate and consistent, enabling a relevant assessment of the influence of lupus on patients’ anxiety.
Another strength of this discussion lies in the use of different measures to assess the performance of models employed in predicting anxiety in lupus patients. This multi-measurement approach provides a comprehensive and rigorous view of model effectiveness, and strengthens the validity of the results obtained.
However, there are a number of limitations to this study. First, the sample size used for the study is small. A larger sample size would have provided more robust results.
It is important to highlight that the proposed model was not tested by external validation, which implies that the results obtained may be specific to the sample studied and may not be applicable to other lupus patient populations. External validation would have strengthened the reliability and validity of the study results.
To confirm the conclusions of the study, it would be necessary to conduct research with larger samples. Increasing the sample size would improve the reliability of the conclusions and predictions of anxiety in these patients.
5. Conclusion
This study highlights the crucial importance of predicting and treating anxiety in lupus patients. Indeed, the results underline the effectiveness of logistic regression as a predictive model, while acknowledging that there are opportunities for improvement in this approach. Anxiety can have a significant impact on the daily lives of these patients, affecting their ability to manage their disease and maintain social interactions. By adopting these predictive models and implementing appropriate treatment strategies, healthcare professionals will be better prepared to offer targeted care and adequate support to these patients. This could result in a better quality of life and increased well-being for people with lupus, enabling them to lead more fulfilling lives despite the challenges posed by their condition.
Conflicts of Interest
The authors declare no conflicts of interest.