TITLE:
Comparative Study of Machine Learning Techniques for Early Detection of Heart Diseases
AUTHORS:
Chaibou Kadri, Moussa Idi Bachir, Sidi Zakari Ibrahim, Naroua Harouna, Mamadou Fougou Mamadou
KEYWORDS:
Cardiovascular Diseases, Artificial Intelligence, Early Detection, Machine Learning Techniques, Predictive Healthcare System
JOURNAL NAME:
Journal of Computer and Communications,
Vol.13 No.11,
November
25,
2025
ABSTRACT: Cardiovascular diseases (CVDs) are the leading cause of death worldwide, accounting for millions of deaths each year according to the World Health Organization (WHO). Early detection of these diseases is essential to reduce mortality, improve preventive care, and alleviate the burden on healthcare systems. However, traditional diagnostic approaches based on clinical assessment or risk scores have several limitations, particularly in terms of sensitivity, generalizability, and their ability to capture complex interactions among risk factors. The rise of Artificial Intelligence (AI), and particularly Machine Learning (ML), offers new opportunities for developing more effective predictive systems. This study presents a comparative analysis of five supervised ML algorithms—Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN)—applied to the early detection of heart disease. Three benchmark datasets were used: the primary Kaggle Heart Disease dataset (1,025 records, 14 clinical variables), the UCI Cleveland Heart Disease dataset (303 records), and the Framingham Heart Study dataset (4,020 records). Models were evaluated using multiple performance metrics, including accuracy, F1-score, precision, recall, ROC-AUC, and the Matthews Correlation Coefficient (MCC). Experimental results revealed that the Random Forest classifier achieved the best overall performance on the Kaggle dataset (accuracy = 99.26%, F1 = 99.28%), followed by ANN and DT (accuracy = 98.78%). On the Cleveland dataset, RF also outperformed other models (accuracy = 90.34%), while ANN and SVM reached 83.78% and 86.07%, respectively. For the Framingham dataset, RF maintained strong results (accuracy = 86.34%), confirming its robustness across heterogeneous data sources. These results highlight the importance of selecting the appropriate algorithm according to reliability and sensitivity requirements in medical contexts. The study also demonstrates the importance of selecting models according to data characteristics and clinical objectives, reinforcing the potential of AI-based approaches for early and reliable cardiovascular risk prediction.