TITLE:
Predicting Survey Response Rates Using XGBoost: A Case Study on Organizational Data
AUTHORS:
Aida Hakemi, Rezza Moeini, Maslin Masrom
KEYWORDS:
Survey Response Rate, XGBoost, Machine Learning, Feature Importance, Predictive Modeling
JOURNAL NAME:
Open Journal of Social Sciences,
Vol.14 No.1,
January
20,
2026
ABSTRACT: Accurate prediction of survey response rates is essential for optimizing survey design and ensuring high-quality data collection. Traditional methods often struggle to capture the complexity and multidimensionality of organizational datasets. This study applies the extreme Gradient Boosting (XGBoost) algorithm to predict response rates using organizational and demographic features. The model was trained on features including age, gender, job level, send hour, weekday, allowed response window, number of reminders, and total sent forms. The XGBoost model achieved strong predictive performance with an R2 score of 0.85 and a Mean Squared Error (MSE) of 0.02, reflecting the high accuracy in predicting response rates. Analysis of feature importance revealed that sent forms (46.6%) and Reminder (42.6%) were the most influential factors, while job_level (2.55%) and weekday (2.67%) also contributed to response behavior. Scatter plots of actual versus predicted response rates confirmed minimal deviation, demonstrating the reliability of the model. These results highlight the potential of machine learning techniques, particularly XGBoost, in accurately modeling survey response rates. Understanding feature importance allows researchers and organizations to strategically adjust survey design elements such as the number of invitations and reminders, to maximize participation.