TITLE:
A Hybrid Ensemble Learning Approach Utilizing Light Gradient Boosting Machine and Category Boosting Model for Lifestyle-Based Prediction of Type-II Diabetes Mellitus
AUTHORS:
Mahadi Nagassou, Ronald Waweru Mwangi, Euna Nyarige
KEYWORDS:
Boosting Ensemble Learning, Category Boosting, Light Gradient Boosting Machine
JOURNAL NAME:
Journal of Data Analysis and Information Processing,
Vol.11 No.4,
November
27,
2023
ABSTRACT: Addressing classification and prediction challenges, tree ensemble models
have gained significant importance. Boosting ensemble techniques are commonly
employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting
Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy,
loss reduction, and enhanced training precision. However, LightGBM is prone to
overfitting. In contrast, CatBoost utilizes balanced base predictors known as
decision tables, which mitigate overfitting risks and significantly improve testing time efficiency.
CatBoost’s algorithm structure counteracts gradient boosting biases and
incorporates an overfitting detector to stop training early. This study focuses
on developing a hybrid model that combines LightGBM and CatBoost to minimize
overfitting and improve accuracy by reducing variance. For the purpose of
finding the best hyperparameters to use with the underlying learners, the
Bayesian hyperparameter optimization method is used. By fine-tuning the
regularization parameter values, the hybrid model effectively reduces variance
(overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost,
Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that
the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy
(99.37%). Consequently, the proposed framework holds promise for early diabetes
prediction in the healthcare industry and exhibits potential applicability to
other datasets sharing similarities with diabetes.