TITLE:
A Hybrid Machine Learning Framework for Early Diabetes Prediction in Sierra Leone Using Feature Selection and Soft-Voting Ensemble
AUTHORS:
Aminata Bah
KEYWORDS:
Diabetes Prediction, Hybrid Ensemble, Soft Voting, Feature Selection, XGBoost, Sierra Leone
JOURNAL NAME:
Journal of Software Engineering and Applications,
Vol.19 No.2,
February
25,
2026
ABSTRACT: This paper proposes a hybrid machine learning framework for early diabetes prediction tailored to Sierra Leone, where locally representative datasets are scarce. The framework integrates Random Forest (RF), Logistic Regression (LR), and Extreme Gradient Boosting (XGBoost) into a probability-based soft-voting ensemble that prioritizes sensitivity (recall) for screening. Experiments were conducted under two conditions: 1) using all available features and 2) after feature selection based on RF importance, retaining six clinically meaningful predictors (Glucose, Diabetes Pedigree Function, Skin Thickness, Age, Body Mass Index, and Insulin). Evaluation employed Accuracy, Precision, Recall, F1-score, ROC-AUC, and confusion-matrix analysis with a screening-oriented decision threshold. Before feature selection, the hybrid model achieved a Recall of 0.8571 and an ROC-AUC of 0.8610, reducing false negatives compared with individual classifiers. After feature selection, performance remained competitive while improving interpretability and deployment feasibility. Benchmark validation on the Pima Indians Diabetes dataset further supported the robustness of the approach. The proposed hybrid framework provides a practical, sensitivity-focused decision-support tool for early diabetes screening in low-resource clinical environments.