Research on Telecom Customer Churn Prediction Based on GA-XGBoost and SHAP

To address the prominent problems faced by customer churn in telecom enterprise management, a telecom customer churn prediction model integrating GA-XGBoost and SHAP is proposed. By using the ADASYN algorithm for data processing on the unbalanced sample set; based on the GA-XGBoost model, the XGBoost algorithm is used to construct the telecom customer churn prediction model, and the hyperparameters of the model are optimized by using the genetic algorithm. The experimental results show that compared with traditional machine learning methods such as GBDT, decision tree, KNN and single XGBoost model, the improved XGBoost model has better performance in recall, F1 value and AUC value; the GA-XGBoost model is integrated with SHAP framework to analyze and explain the important features affecting telecom customer churn, which is more in line with the tele-com industry to predict customer the actual situation of churn.


Introduction
With the booming development of information technology and mobile networks, the competition in the telecommunication industry is becoming increasingly severe. It is known that the cost of developing a new subscriber is 5 -6 times higher than the cost of retaining an existing customer [1], and the choice of the subscriber determines the development of the company, therefore, the ability to successfully predict churn and effectively reduce it has become an important concern for the telecommunication industry. It is necessary for telecommunication companies to build a model that can accurately predict the tendency of cus-In recent years, for the telecommunication customer churn prediction problem, scholars at home and abroad have been using data mining techniques to analyze and establish customer churn prediction models and apply classification algorithms to the field of telecommunication customer churn, which is of great practical significance for telecommunication companies to tap effective customers. In China, Qian et al. [2] improved the support vector machine by introducing a cost-sensitive function and used different penalty coefficients on the objective function to incorporate different misclassification costs into the modeling process for the study of telecommunication customer churn prediction. The literature [3] introduced kernel principal component analysis (KPCA) to customer churn prediction and proposes a corresponding feature extraction algorithm. The literature [4] gives a pruned random forest approach and proposes an effective random forest similarity matrix-based variance estimation based on the degree of variance of important factors affecting the stability of the combined classifier. Wang [5] used a real dataset of a provincial telecom operator as a research object, preprocessed the data and used PSO-BP neural network for customer churn prediction by data equalization based on AdaCost algorithm and feature selection based on Relief filtering method, and proved the feasibility of the algorithm for operator customer churn management work. In foreign countries, Veronikha Effendy et al. [6] used combined sampling and weighted random forest to deal with imbalanced data in customer churn prediction. Ammar A.Q. Ahmed [7] used a hybrid firefly algorithm based approach for churn prediction of large telecommunication data, using firefly algorithm for identifying optimal solutions and combining simulated annealing algorithm with firefly algorithm for optimization. Awnag et al. [8] proposed a regression based churn prediction model to identify customer churn by using multiple regression analysis, this technique uses customer feature data for analysis and provides good performance. Sara Tavassoli et al. [9] proposed three hybrid integrated classifiers based on bagging and boosting, the proposed method can be applied not only for customer churn prediction but also for any other binary classification algorithm application.
Although the above studies have contributed to telecommunication customer churn prediction, due to the complexity of machine learning models, there is no rational explanation of the model, and only the importance of a feature can be judged in terms of the importance of the features affecting customer churn, and it cannot explain how the feature affects the prediction results. In order to solve the above problems, this paper tries to establish a telecom customer churn prediction model that integrates SHAP and the improved XGBoost algorithm, and hyper-parameter tuning of the parameters of XGBoost model by genetic algorithm. SHAP can explain and analyze various factors affecting telecom customer churn and provide corresponding information for telecom companies to adopt corresponding policies.

Overall Process of Customer Churn Prediction Model Construction
The overall flowchart of telecom customer churn prediction model construction is shown in Figure 1. In this paper, we use the common dataset of customer churn of a telecom company on Kaggle website, analyze and process the abnormal values and missing values in the dataset, extract features from the dataset, and use ADASYN algorithm to process the data imbalance problem, and then construct the telecom customer churn prediction model by XGBoost, decision tree, K-nearest neighbor, GBDT and LightBGM, extreme forest algorithm to construct the telecom customer churn prediction model, and obtain the optimal model by comparing the corresponding evaluation indexes, then use genetic algorithm to intelligently optimize the optimal model to obtain the final model, and finally use SHAP framework to interpret and analyze this customer churn prediction model.

XGBoost Algorithm
XGBoost (Extreme Gradient Boosting) is an algorithm based on GBDT, which was proposed by Chen in 2016 [10]. It is optimized for GBDT and can effectively handle the relationship between data. For example, it optimizes the loss function by using the second-order Taylor formula expansion to improve the computational accuracy, simplifies the model by using regular terms to avoid overfitting, and uses Blocks storage structure for parallel computation [11]. The specific objective function is shown in Equation (2), where ( ) L ϕ is the loss function of the predicted value and the true value; Ω is the penalty to the model complexity, i.e., the regularization term of the objective function, and the specific regularization term is shown in Equation (3); Y and λ denote the term coefficients of the regularization, and T denotes the number of leaf nodes of the k-th tree.
Minimize the loss function of the optimized XGBoost with the regularization term, as shown in Equation (4).
For the objective function use the second-order expansion of Taylor's formula and simplify it as shown in Equation (5), where i g is the first-order derivative Derivative of ω and making the derivative zero, the ω minimizes the objective function can be found as shown in Equation (6).
is the minimum value of the objective function, the smaller the value, the better the tree model, corresponding to the minimum value of the objective function as shown in Equation (7).
Calculate the score of the split node of the tree model as shown in Equation (8).

Genetic Algorithm
Genetic Algorithm (GA) is a stochastic global optimization and search method that obtains the output optimal solution by inputting an objective function and constraints, drawing on Darwin's biological evolution and Mendel's genetic mechanism [12]. The genetic algorithm simulates the replication, crossover, and mutation occurring in natural selection and heredity according to the law of evolution of organisms in nature, and starts from any initial population, and through random selection, crossover, and mutation operations, produces a group of individuals better suited to the environment, evolves the population to a better region in the search space, and so on from one generation to another, and finally converges to a group of individuals best adapted to the environment, and to obtain the optimal solution of the problem [13]. As a relatively mature algorithm, genetic algorithm can be applied in the fields of function optimization, combinatorial optimization, and shop floor scheduling.

GA-XGBoost Algorithm
The XGBoost model suffers from many parameters, slow convergence, and large influence of parameters on model prediction results, while the traditional grid search for hyperparameter tuning suffers from low accuracy and long running time. Therefore, this paper proposes a GA-XGBoost model combining genetic algorithm and XGBoost model, and uses the global search capability of genetic algorithm to optimize the tuning parameter selection for XGBoost, and uses AUC as the fitness function to adjust the index. The GA-XGBoost optimal hyperparameter combination is the optimal number of chromosomes output by the genetic algorithm when the number of iterations meets the termination requirement. For the telecom customer churn prediction model, three parameters, n_estimators, learning_rate, and max_depth, are optimized and initialized by the genetic algorithm. The optimal parameters are obtained by using the genetic algorithm to obtain a new generation of population, for which the parent population is replicated, crossed, and mutated, and then the best individuals are replaced by the worst individuals by calculating the fitness values of the offspring population. The GA-XGBoost algorithm is shown below: The interpretability of machine learning algorithms is currently a hot topic in artificial intelligence research, and the GA-XGBoost model is poorly interpretable as a black-box model due to the high complexity of the integrated learning model. To solve the problem of poor interpretability of the GA-XGBoost model, the SHAP framework is introduced to interpret the results reliably, and the SHAP framework has powerful visualization functions and possesses the ability to display the interpreted results of model predictions, which is widely applied to interpret more complex classification and regression models [16]. Meanwhile, the traditional feature importance ranking can only determine the importance of a feature and does not explain how the feature affects the prediction results, while the greatest advantage of SHAP value is the ability to explain and analyze the degree of influence of each feature, and also to reflect the positive and negative influence of each feature.

Experimental Data and Its Preprocessing
The experimental data in this paper originates from the data of a telecom company on the Kaggle platform. There are a total of 4025 customer samples in the data training set, of which each sample includes 20 feature attributes, consisting of several dimensions of label information affecting customer churn characteristics and whether the user is finally lost, and the basic information of the dataset is shown in Table 1. The analysis reveals that the dataset is seriously unbalanced, in which there are 589 pieces of data of customer samples that have been lost and 3652 pieces of data of customer samples that have not been lost, as shown in Figure 2.
For continuous features in the dataset, the features are normalized by removing the mean and scaling variance; for discrete features in the dataset, one-hot is used for solo-hot coding when there is no size relationship, and numerical mapping is used when there is a size correlation between attributes.

Evaluation Index
In the field of telecom customer churn, the learning metrics for assessing telecom customer churn were not limited to Accuracy for assessment, but were selected to focus on positive examples, where Accuracy, Recall, Precision, F1-score metrics were derived from the confusion matrix, as shown in Table 2.
The evaluation metrics used in the telecom customer churn prediction problem are accuracy, precision, recall, F1-score, and AUC. The specific formulas for the above evaluation metrics are shown in the following order.

Experimental Results and Analysis
The paper selected six classification models, XGBoost, LightGBM, DecisionTree, KNN, GDBT, and ExtraTrees for comparison, and selected Accuracy, AUC, Recall, Precision, F1-score as the evaluation indexes of the models, and the ROC comparison results of each classification model are shown in Figure 3.
After the experimental comparison, it was found that XGBoost reached 0.9477 in terms of AUC, which is significantly higher than other models, and therefore is the optimal model. The iterative results of the genetic algorithm for hyperparameter tuning are shown in Figure 4. The final approximate optimal solution is n_estimators = 55, learning_rate = 0.18, max_depth = 4.
The XGBoost model with hyperparameter tuning by genetic algorithm is defined as GA-XGBoost, where the confusion matrix of GA-XGBoost model is shown in Figure 5, and the ROC comparison between GA-XGBoost and XGBoost is shown in Figure 6, from which it can be seen that GA-XGBoost improves 1% in AUC compared to XGBoost model and the AUC value is basically above 0.90.    As can be seen from Table 3

Explanatory Analysis of the Model Based on SHAP
SHAP feature importance, i.e., the degree of contribution of each feature to enhance the overall model predictive power, provides a more direct representation of the degree of influence of the feature on the model. The focus of this chapter focuses on the explanatory analysis of model prediction results for telecommunication customer churn based on the SHAP framework [17].  In the SHAP feature analysis effect plot, base_value is the mean value of the telecom customer churn prediction model. The red range indicates that the feature contributes positively to telecom customer churn, and the blue range indicates that the feature contributes negatively to customer churn. Figure 8 shows Journal of Computer and Communications  for the first data in this test set is −3.43, which means that it is more likely to be non-churning and it is predicted to be non-churning customers due to reasons such as longer total minutes of calls in a day, more calls in a day, still using the voicemail plan, etc. Eventually, through the calculation of the self-attention layer, the short-term behavioral history sequence of the user is converted into a deeper representation that reflects the short-term interest mu t of the current user.
In the SHAP scatter plot in Figure 9, total_day_minites reflects the process of customer churn, showing a V-shaped trend of "up-down-up", i.e., the total number of minutes of calls per day is in the range of 150 -250 minutes, and the model output has a low churn rate. In this paper, we believe that these customers are more willing to continue to use the package of the telecom company because their usage time is relatively stable, while those who have more than 250 minutes a day may choose a more cost-effective telecom package due to their higher daily usage, i.e., they are more likely to churn.

Conclusions
To address the problem of telecom customer churn prediction, a model combining genetic algorithm and XGBoost algorithm is proposed in this paper, and the SHAP framework is used to supplement the model interpretability. The factors affecting telecom customer churn are analyzed using the proposed method, and the main factors affecting telecom customer churn are identified as the length of calls per day, the number of calls per day, and whether to subscribe to a voicemail plan, etc. Through the analysis, it is found that customers with longer and more frequent daytime calls are easier to retain, and this group may be dependent on the telecom industry due to their work demands, and telecom companies can offer daytime call packages with appropriate discounts so as to retain more and more valuable users.
Although the GA-XGBoost algorithm proposed in this paper achieves better results in terms of evaluation indexes compared with traditional machine learning algorithms, there are still shortcomings, as the model uses genetic algorithm for parameter optimization, which leads to long running time in the process of finding optimal parameters. The next step will be to combine the feature selection method to filter the features of the model and remove the redundant features in order to reduce the prediction time of the telecom customer churn model, which will be the research direction after this topic.