Customer Churn Prediction Using AdaBoost Classifier and BP Neural Network Techniques in the E-Commerce Industry

In customer relationship management, it is important for e-commerce businesses to attract new customers and retain existing ones. Research on customer churn prediction using AI technology is now a major part of e-commerce management. This paper proposes a churn prediction model based on the combination of k-means clustering and AdaBoost classifier algorithm, allowing the segmentation of customers into three categories. Important customer groups can also be determined based on customer behavior and temporal data. Customer churn prediction was carried out using AdaBoost classification and BP neural network techniques. The results show that the research method of clustering before prediction can improve prediction accuracy. In addition, a comparative analysis of the results suggests that the AdaBoost model has better prediction accuracy than the BP neural network model. The research results of this paper can help B2C e-commerce companies develop customer retention measures and marketing strategies.


Introduction
Customer relationship management (CRM) is crucial in marketing, as it is the core of enterprise information management (Kotler & Keller, 2016). In the past decade, enterprises have focused on CRM, especially the challenges of customer churn (Daoud et al., 2018). Customers are key to the success of enterprises. Companies can improve their market competitiveness and economic benefits (Bi, 2019) retain their customers through advertising or product optimization, but these retention measures that are not targeted at specific customers can lead to rising costs and potential waste of resources (Jahromi et al., 2014). Chung et al. (2016) found that the cost of acquiring new customers is five to six times higher than retaining existing ones. Thus, it is essential for enterprises to conduct customer churn prediction research and analyze the causes of customer churn to win back lost customers.
There have been numerous studies on customer churn in various industries, including telecommunications (Bock & De, 2021;Kozak et al., 2021;Alboukaey et al., 2020;Verbeke et al., 2012;Coussement et al., 2017), finance (Devriendt et al., 2021;Dumitrescu et al., 2022;Velez et al., 2020) and e-commerce (Gattermann-Itschert & Thonemann, 2021;De et al., 2021;Gordini & Veglio, 2017;O'Brien et al., 2020). Zhao et al. (2021) analyzed the causes and influencing factors of customer churn in the telecommunications industry through data mining algorithms, and found that logistic regression algorithms can accurately identify the causes of customer churn while customer lifetime value modeling was the best method to manage customer churn. Meanwhile, Devriendt et al. (2021) adopted a new logistic regression tree algorithm to study customer churn issues in the financial industry. The results showed that this method has an accurate prediction and can help the financial and credit industry improve risk efficiency. Moreover, Schaeffer & Sanchez (2020) predicted customer churn in the B2B industry using the support vector machine (SVM). The results indicated that SVM can help enterprises identify churned and non-churned customers in a timely manner, while saving time and money. In these studies, the Recency, Frequency, Monetary (RFM) model and various machine learning algorithms were adopted for customer churn prediction, such as logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM) and k-nearest neighbors (KNN). These studies can reduce customer churn and help companies devise effective marketing strategies.
However, churn prediction research in the aforementioned literature mainly focused on the telecommunications and financial industries. There were few studies addressing customer churn in e-commerce but only discussed customer churn in B2B. Technology has transformed online shopping, with e-commerce sites and portable mobile devices gaining traction. Online shopping is diverse and convenient among B2C customers. Customers' shopping behavior datasets usually include time of shopping, purchase readiness, purchase intention and customer satisfaction (Kotler & Keller, 2008). Chen et al. (2005) stated that customer demographics can be obtained directly from enterprises' data warehouses, while longitudinal behavior data of B2C customers is temporary and vary over the time of shopping. For example, e-commerce websites usually provide services and features such as product collection, shopping cart, evaluation manage-American Journal of Industrial and Business Management ment, shopping reward points, time of delivery, time of receipt, payment methods, invoice management and product specifications. Such data are usually stored separately in transactional databases and characterized by longitudinal temporality and multi-dimensionality. These information variables may result in better customer churn predictions, but existing literature on e-commerce customer churn prediction often ignores longitudinal behavior and longitudinal temporality data (Eichinger et al., 2006;Orsenigo & Vercellis, 2010). In addition, the consumption behaviors of customers in the telecommunications or financial industry are different from online B2C customers, with the former grouped as contractual customers and the latter representing non-contractual customers. In terms of business management, the consumption behaviors of contractual customers are relatively straightforward as it is easier to ascertain customer churn based on the data variables of consumption behaviors. On the other hand, B2C enterprises find it difficult to identify and predict customer churn. Existing literature on B2C customer churn prediction model is incomplete, and such studies are also lagging behind. Hence, it is of great significance to conduct research on customer churn prediction in B2C contexts.
With the wide application of big data and machine learning, businesses can easily acquire information from consumption behavior data for analysis and prediction modeling. Based on the real data of an e-commerce website and analysis of shopping behavior information variables, this study will segment customers using k-means clustering before filtering features through a random forest, and a B2C customer churn prediction model is established through the AdaBoost classification algorithm. To verify the advantages of AdaBoost algorithm modeling, the results of BP neural network modeling are analyzed and discussed.
The second part of this paper will discuss existing literature and the third part will address the research methods and introduce the basic principles of Ada-Boost and BP neural network algorithms. Furthermore, the fourth part involves empirical research, including data preparation, data preprocessing, customer segmentation, variable screening and prediction evaluation indicators. The final part will conclude the findings and address future research directions.

Literature Review
Customer churn refers to a situation where customers stop using specific products or services in favor of another competitor's products or services (Amin et al., 2017). Studies on customer churn involved three aspects-churn prediction and identification, churn cause analysis and customer retention strategies. Churn prediction has become increasingly complex due to the different consumption characteristics of customers in various industries, making it difficult for enterprises to determine whether they are losing customers. Previous research showed that it is hard to obtain accurate predictions through the RFM model and temporal threshold method. Schmittlein et al. (1987) first adopted the Pareto/NBD models to predict customer behavior, before Fader et al. (2005) proposed the BG/NBD models. With the wide application and promotion of big data and data mining, progress has been made on research in customer churn prediction. There are currently three types of algorithms for customer churn prediction: traditional statistics-based prediction, machine learning-based prediction and classification-based prediction.
The main traditional statistics-based prediction methods are the logit model, linear discriminant analysis and quadratic discriminant analysis. Jahromi et al.
(2014) adopted the logit model to study customer churn issues of a B2B ecommerce platform in Australia, and compared the decision tree model with the Boosting algorithm. They found that the logit model can be used to predict customer churn, but the results were not as accurate as other prediction models.
Moreover, Nie et al. (2011) developed a logit model for a bank's credit card customer data to identify potential customer churn, and compared the prediction through the decision tree model. The results suggested that the prediction of the logit model was more accurate. Machine learning predictive models include decision tree (DT), support vector machine (SVM), artificial neural network (ANN) and other algorithms. De et al. (2018) conducted a comparative analysis on various datasets through the decision tree algorithm. The results showed that decision tree was deficient in dealing with linear relationships among variables. Neslin et al. (2006) believed that the decision tree algorithm can be applied as the base model for churn prediction. On the other hand, Zhang & Zhang (2015) conducted churn prediction for the short message services of telecommunication companies, and C5.0 decision tree predictive model was found to have high accuracy. Farquad et al. (2014) performed churn prediction for bank credit card customers and proposed a hybrid method to extract rules from SVM. Moreover, Gordini & Veglio (2017) conducted churn prediction for B2B e-commerce customers, and found that SVM was better in dealing with unbalanced and nonlinear data. Tian et al. (2007) adopted the 2-layer neural network to extract variables from data of telecommunication customers, and proposed a churn prediction model based on artificial neural network. The results indicated that the prediction of this method was more accurate than decision tree and Naive Bayes classifier.
Ensemble classification and prediction methods refer to the combining of some base models and transforming weak classifiers into strong ones through integration. The common ensemble methods are Boosting, Gradient Boosting, AdaBoost and XGBoost. With different base models and ensemble rules, ensemble methods include linear discriminant method (Xie & Li, 2008), decision tree (Abbasimehr et al., 2014), support vector machine (Vafeiadis et al., 2015), and neural network (Gordini & Veglio, 2014). Wu & Meng (2016) conducted churn prediction of e-commerce customers and improved the classification accuracy of the classifier by reducing the dataset size and combining it with the AdaBoost algorithm, which has a high prediction accuracy. Furthermore, Ji et al. (2021) studied the telecommunication customer dataset with temporal characteristics and adopted the XGBoost hybrid algorithm to filter the features of customer churn. The results showed that the XGBoost hybrid algorithm had good predic-American Journal of Industrial and Business Management tion performance. Ahmed & Maheswari (2019) performed churn prediction of customers in the telecommunications industry and proposed a predictive model integrating heuristic algorithms. On the other hand, Ying et al. (2010) conducted churn prediction of bank customers and used integrated LDA and Boosting methods that produced accurate predictions. Zhang et al. (2014) found that an integrated model of CART and adaptive Boosting has a high prediction accuracy.
Most of the aforementioned literature is related to the telecommunications and financial industries, with predictive models producing inconsistent results.
Therefore, it is necessary to develop a targeted churn prediction model in B2C, and take into account variables such as the features of the B2C environmenttype of product, product collection, adding products to the shopping cart, product preferences and time of shopping. This paper will evaluate the churn prediction performance of AdaBoost ensemble classifier in the B2C environment. The AdaBoost model will also be analyzed in comparison with BP neural network model.

Research Method
Customer churn prediction is related to classification research, which involves churned and non-churned customers. Given that customers can shop anytime in a B2C context, factors such as time of shopping and behavioral tendencies in each period may be critical in identifying customer churn. This study uses ecommerce data, this data set contains multiple shopping behavior variables, and can reflect the time attribute of shopping, which is very suitable for the research content of this paper, which was first preprocessed before k-means clustering was adopted to classify customers. AdaBoost and BP neural network were then used for modeling to ascertain the prediction accuracy of the two models and make a comparative analysis. The basic research process is shown in Figure 1. The principles of AdaBoost and BP neural network algorithms are discussed below.

AdaBoost
AdaBoost is an iterative algorithm first proposed by Yoav Freund and Robert Schapire (Freund & Schapire, 1996). Its core idea is to train different classifiers in the same training set, i.e., weak classifiers that are then combined to form a stronger classifier. The training set sample is: The output weight of the k th weak learner in the training set is: Churned and non-churned customers are binary classifications with outputs of {−1, 1}, and the weighted error rate of the k th weak classifier G k (x) in the training set is: The weight coefficient of the k th weak classifier G k (x) is: Suppose that the sample set weight coefficient of the k th weak classifier is , then the sample set weight coefficient of the corresponding k + 1 st weak classifier is: AdaBoost classifier uses a weighted voting technique, and the final strong classifier is: AdaBoost is an advanced ensemble algorithm with a high detection rate and is less prone to overfitting. In each iteration, a weak classifier is first trained in the sample set. As each sample has many attributes, training the optimal weak classifier from a large number of features requires intensive computation.

BP Neural Network
The Sigmoid transfer function (Minka, 2004) in the BP neural network is a nonlinear transformation function. The Sigmoid function is defined as: The domain of the function is a set of real numbers, with a range between [0, 1]. Using the three-layer perceptron as an example, an output error E can occur when the neural network is not giving the expected output. The output error E is defined as: After expanding the aforementioned error definition to the hidden layer, then The network input error is the function of the weights W jk and V ij of each layer. This means that Error E can be changed by adjusting the weights. Adjusting the network's weights can keep reducing the error. Therefore, the weights should be proportional to the gradient of the error, which is: The BP neural network's function and derivatives are continuous, making it easier in processing. Training is fast, and the computation of classification is only related to the number of features, allowing easy interpretation of results for continuous and categorical variables.

Experimental Study
The datasets published on the Alibaba Cloud Tianchi platform (

Data Preprocessing
The first step of data processing is to convert the timestamp of each raw data item. The converted time format is "year/month/day" and "hour/minute/second".
Throughout the day, customers' shopping may generate behavioral data such as "Product collection", "Add to shopping cart", "Favorites" and "Click". These behavioral data are rarely mentioned in existing literature, even though such data can play an important role in churn prediction (Cao, 2008

Customer Segmentation
Many previous studies did not conduct the classification process before churn prediction in a B2C environment, which may lead to low prediction accuracy and precision. This paper asserts that the segmentation of customers can improve the accuracy of customer churn prediction. Companies can devise targeted and effective marketing strategies based on the shopping behavior of various customer groups, which can identify key customers, general customers and churnprone customers. Relevant studies (Pham et al., 2004;Chen et al., 2001) showed that k-means clustering is simple and easy for implementation, and has been widely adopted. Thus, this paper uses k-means clustering for customer segmentation. The aforementioned 17 clustering variables were used. The number of clusters, i.e. K is determined in advance for a given sample dataset, so that the samples in the clusters are distributed as close as possible and the distance between clusters is as long as possible. We tested from K = 2 to k = 8 one by one.
When k = 3, the distance between clusters was the longest. Thus, the number of clusters is 3, i.e., the customers were segmented into 3 categories-Cluster I,

Feature Selection
The basic method of churn prediction is to incorporate variables into the model as data features for prediction. An excessive number of features will lead to data redundancy and may affect the model's prediction performance (Verbeke et al., 2012). All 17 variables were included as features in the model for prediction, leading to a decrease in prediction accuracy, or even prediction failure. Therefore, the 17 features were filtered first to select those that are suitable for prediction. Random forest is an effective feature selection algorithm with high classification accuracy and good generalization (Breiman, 2001). The key part in the feature selection process is how to select the optimal number of features (M).
The out-of-bag (OOB) error (Breiman, 1996) was adopted first, and the number of features was determined based on the minimum OOB error rate before computing the importance of features. The selection of the number of features is shown in Table 1, which indicated that the OOB error rate is the lowest at 0.081 when the number of features is 4. The Gini index can discern the importance of features. The higher the Gini value is, the higher the importance of features (Goldstein et al., 2011). As shown in Table 2, "Night Buy", "PM Buy", "Night PV" and "PM PV" were selected as features for churn prediction. The data in

Evaluation Metrics
Three main metrics were used to evaluate the predictive model's performance-Accuracy, Recall and Precision. Once the receiver operating characteristic (ROC) curve was drawn, the model can be evaluated based on the area under the curve (AUC) (Fan & Ke, 2010). The formulae of the three metrics are as follows: TP TN Accuracy TP FN FP TN

Results and Discussion
Data was input into the AdaBoost and BP neural network models for prediction, and iterations were repeated until convergence is achieved. The 10-fold crossvalidation method was used to divide the data into 10 parts, where 9 were used as the training set and 1 as the test set. The average value obtained from the 10 tests was used as the final evaluation result of the AdaBoost and BP neural network models. The confusion matrix was obtained after the AdaBoost and BP neural network models were applied to the test set data. The confusion matrices before and after segmentation are shown in Tables 3-6. Figure 2      program in this paper.

Customer Segmentation Analysis
Segmentation of customers was conducted to identify important and general customers. Enterprises formulate marketing strategies and enhance their products based on the importance of their customer base, while matching products with customer preferences to retain customers and improve marketing performance. According to the results in Section 4.2, the churn rate of Cluster I customers reached 90.2%, but the non-churn rate was 9.8% higher compared with Cluster II and III customers. Cluster III had the lowest number of customers at 524. The proportion and non-churn rate of the three types of customers indicated that Cluster I customers may be important for the enterprise. Hence, the company should further analyze Cluster I customers to develop personalized marketing plans and prevent these customers from churning. This result also showed the effectiveness of segmentation before prediction, which was more targeted. These insights can be applied to data analysis, customer classification and predictive modeling for B2C e-commerce enterprises.

Performance of Predictive Model
A comparative experiment on AdaBoost and BP neural network models was conducted. The Accuracy, Recall and Precision values of each category were computed based on the confusion matrix to evaluate the performance of the three categories in the dataset. Table 5 and Table 6 Figure 3. These experimental data proved that the AdaBoost model has better generalization and prediction performance. Thus, the AdaBoost model is recommended for predicting customer churn in B2C e-commerce over the BP neural network model.

Customer Management
In B2C e-commerce marketing, customer churn prediction aims to improve enterprises' customer relationship management and increase profits. There is no doubt that developing a model to accurately predict customer churn can have a positive impact on the management and finances of companies. The results of this study will help boost enterprises' customer relationship management. Accurate clustering of customers and identification of important customers will help enterprises maintain their core customer base. Segmentation of churned and nonchurned customers allows companies to carry out targeted marketing activities and formulate effective strategies. However, misclassification can occur where non-churned customers might be classified as churned, or churned customers might be classified as non-churned. According to Coussement (2014) and Viaene & Dedene (2005), incorrect identification of churned customers will have a negative impact on enterprises' customer retention measures and profits. Therefore, it is a long-term undertaking for enterprises to accurately classify and predict customer churn, which is particularly important in the highly competitive B2C market.
Based on this research, the AdaBoost model is recommended for B2C e-commerce enterprises to accurately identify churned customers and develop effective customer retention measures to reduce administrative costs (Jahromi et al., 2014).

Conclusion
This paper used a set of B2C e-commerce data to conduct customer churn prediction. After data cleaning, the data samples were discretized and 17 variables were collated. Customers were segmented into three categories through k-means clustering before customer churn prediction was performed. The Accuracy, Recall, Precision and AUC of the three categories were computed and the performances of the predictive models were evaluated. The AdaBoost ensemble classification model and BP neural network can effectively predict customer data with a large number of features. Furthermore, adding customer behavior data and temporal data to the RFM model can better reflect the shopping behavior of B2C customers in the prediction. The results confirmed the importance of this method in customer churn prediction and marketing decision-making, and showed the promising role of AdaBoost model in establishing an effective early warning model for customer churn management in the B2C industry. However, the results may have some limitations due to dataset issues and generality of predictive models. This paper did not study the persistence of predictive models, which refers to the prediction performance of models over a specific number of periods after the estimation period. If the persistence of a prediction model is limited, the model cannot be used for a long time in the B2C environment. Therefore, it may be necessary to establish new models for the prediction of customer churn over extended periods.

Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.