TITLE:
Credit Risk Modeling in Banking: A Comparative Analysis of Logistic Regression and Machine Learning Approaches
AUTHORS:
Usmanov Firdavs, Wei Wang
KEYWORDS:
Credit Risk Modeling, Probability of Default (PD), Logistic Regression, Machine Learning, Random Forest, Gradient Boosting, Credit Scoring, Banking Risk Management, Model Risk Management, Explainable Artificial Intelligence (XAI), Financial Analytics, Regulatory Compliance
JOURNAL NAME:
Journal of Computer and Communications,
Vol.14 No.4,
April
27,
2026
ABSTRACT: Credit risk assessment is a fundamental component of banking operations, directly influencing lending decisions, capital allocation, pricing strategies, and regulatory compliance. Traditionally, logistic regression has been the dominant methodology for probability of default (PD) estimation due to its statistical robustness, interpretability, and regulatory acceptance. However, the increasing availability of large-scale financial and behavioral datasets, combined with advancements in computational power, has facilitated the adoption of machine learning techniques such as Random Forests, Gradient Boosting Machines, Support Vector Machines, and Neural Networks for credit risk prediction. This study is designed as a structured literature-based comparative synthesis, integrating findings from prior empirical research and established theoretical frameworks. It does not rely on a single proprietary dataset but instead develops a consolidated benchmarking perspective based on published evidence. This study provides a literature-based comparative synthesis of logistic regression and selected machine learning approaches in the context of banking credit risk modeling. The comparison is synthesized from existing empirical studies and benchmark evidence reported in the literature across key performance dimensions, including predictive accuracy, discriminatory power, calibration stability, interpretability, computational efficiency, and regulatory compliance considerations. The paper further evaluates the implications of model complexity for explainability, governance, and model risk management under contemporary regulatory frameworks. By synthesizing theoretical foundations and empirical evidence from existing literature, this research aims to provide a structured framework to guide financial institutions in selecting appropriate modeling techniques based on operational objectives, regulatory constraints, and data characteristics. The synthesized evidence indicates that while machine learning methods often outperform logistic regression in predictive performance, logistic regression retains advantages in transparency, stability, and ease of regulatory validation. A hybrid modeling strategy combining performance gains from machine learning with interpretability safeguards is therefore recommended for practical banking applications. The analysis primarily targets retail and small-to-medium enterprise (SME) lending portfolios, where probability of default (PD) modeling plays a central role under Internal Ratings-Based (IRB) regulatory frameworks and IFRS 9 expected credit loss (ECL) requirements.