Hybrid Data Mining Models for Predicting Customer Churn

The term “customer churn” is used in the industry of information and communication technology (ICT) to indicate those customers who are about to leave for a new competitor, or end their subscription. Predicting this behavior is very important for real life market and competition, and it is essential to manage it. In this paper, three hybrid models are investigated to develop an accurate and efficient churn prediction model. The three models are based on two phases; the clustering phase and the prediction phase. In the first phase, customer data is filtered. The second phase predicts the customer behavior. The first model investigates the k-means algorithm for data filtering, and Multilayer Perceptron Artificial Neural Networks (MLP-ANN) for prediction. The second model uses hierarchical clustering with MLP-ANN. The third one uses self organizing maps (SOM) with MLP-ANN. The three models are developed based on real data then the accuracy and churn rate values are calculated and compared. The comparison with the other models shows that the three hybrid models outperformed single common models.


Introduction
Customers are the most important asset in any organization since they are considered as the main profit source.Although attracting new customers is important for a company's growth, all businesses have agreed that retaining existing ones is more important because of many reasons: First, it is easier to sell to an existing customer than to attract and sell a new one.Second, building customer loyalty is a target for all industries.The loyal customers are less sensitive to price changes.Statistics show that it costs five to six times more to gain a new customer than to keep the existing ones [1].
In the industry of information and communication technology (ICT), the customer life cycle contains five main stages: acquisition, build up, peak, decline and churn [2].The term customer churn is used in telecommunication industry to define customers who change their supplier or provider to a new one offering same service [3] [4].Therefore, a churn management process is necessary for any successful company who wants to compete in the marketplace and to keep profit at higher levels.The goal of the churn management process is to retain those about to churn before they do so and this is done by churn prediction [5].Churn prediction helps assess the current companies' situation and setting future plans for specific, focused group or setting targeted marketing campaigns [6].In fact, churn prediction is an important element in making an accurate and effective decision [7].
In literature, many researchers proposed different machine learning approaches for predicting customer churn in telecommunication business.Most of these approaches are based on applying single classifier algorithm.Example of these works can be found in [1] [8]- [12].However, churn prediction is a challenging problem due to the imbalance distribution of its classes where usually number of churner customers is much less than the nonchurner ones.This problem makes most of the conventional single machine learning approaches inappropriate for classifying the data [13].For this reason, other researchers proposed hybrid machine learning approaches by combining two or more algorithms for predicting customer churn where one of the algorithms is used for data preprocessing before performing the classification task [13]- [15].Some authors proposed clustering the data into a number of clusters then eliminating some small clusters as a way to filter the data from unrepresentative data [15]- [17].
In this paper, we propose three hybrid data mining models for predicting customer churn in a telecommunication company.All three models are composed of a clustering algorithm and a classification one.First, the clustering algorithm is applied to filter the data from outliers and unrepresentative behaviors.The resulted filtered data of the clustering algorithm become the input to the classification algorithm.Then the classification algorithm uses these data in the learning process and builds the final churn prediction model.For clustering, we use k-means clustering, hierarchical clustering and self organizing maps (SOM) while for classification we use the Multilayer Perceptron Artificial Neural Networks (MLP-ANN).
The rest of this paper is organized as follows: Section 2 covers the preliminaries of the algorithms applied in the models proposed in this work for churn prediction.In Section 3, the proposed models are presented.A description of the dataset used is given in Section 4. Evaluation criteria used to evaluate the proposed models are listed in Section 5. Finally, the experiments and results are discussed in Section 6.

k-Means Clustering Algorithms
The k-means is a clustering algorithm to group objects based on their attributes into a k number of groups; where k is a positive integer number [18].The grouping is base on minimizing some convergence criterion usually by minimizing the sum of squares of distances between the data points and the centroid of the cluster.The algorithm works as follows: First, we determine the number of k clusters we want to create.Then we randomly select the centers of these clusters (called centroids) by taking any random data points as the initial centroids.Then, determine the distance between each data point and the centroid.Euclidean distance or the Manhattan distance is usually selected to measure the distance.In case Manhattan distance is selected to measure the distance, then new centroids are computed as the component-wise median otherwise the mean computes the new centroids [19].Data points are assigned to the closest centroid.The process is repeated until the minimum criterion is met or the assignment of data points to clusters becomes unchanged.

Hierarchical Clustering
Hierarchical clustering algorithm builds a cluster hierarchy or a tree of clusters.It assumes that every point is its own cluster, finds the most similar pair of clusters and start merging them into a parent pair of clusters [20].There are two types of hierarchical clustering techniques either top down or bottom up.Bottom up is an agglomerative method that starts with one point (singleton).Then recursively add two or more appropriate clusters.The process is repeated till k number of clusters is achieved.The second technique is called Divisive or top down where it starts with a big cluster then recursively divided into smaller clusters.The algorithm stops when k number of clusters is achieved.

Self Organizing Maps
The SOM clustering algorithm provided by Teuvo Kohonen, is an unsupervised learning neural network that comprises a one, two, or three-dimensional lattice of units, connected by weighted links to a layer of dummy input units.Lattice maps should be trained and strategy for training is simply based on that a unit or cluster of units will fire (generate output) when a certain kind of input is presented [21].
Training SOMs is based on vector quantization.We have a set of input vectors x1, x2...With each x being a vector using the concept of an input space, we can visualize the input vectors lying within an input space of the same dimension as the vectors.The idea of vector quantization is to divide up the entire space into regions.SOMs are based on competitive, or winner-takes-all, learning.The idea of competitive learning is that when a pattern represented by the input vector x is applied to the input units, one of the responding units wins the competition by virtue of its weight vector being the one with the smallest Euclidean distance from the input vector x.Then the weight vector is changed by moving it closer to input vector x.This is repeated for all patterns in the set [21].

Multilayer Perceptron Neural Networks
Artificial Neural Networks are mathematical models inspired by the biological nervous systems.Many tasks that humans perform naturally fast, such as recognition proves to be a very complicated task for a computer when traditional programming methods are used.By applying Neural Network techniques a model can learn by examples, and create an internal and complex structure of rules to classify different inputs, such as predicting customer churn.Neural networks are useful for pattern recognition or data classification, through a learning process.They simulate biological systems, where learning involves adjustments to the synaptic connections (weights) between neurons.They map a set of input-nodes to a set of output-nodes.The number of inputs/outputs is variable.And the Network itself is composed of an arbitrary number of nodes with an arbitrary topology.Artificial Neural Networks have been widely used to solve data mining problems since they adapt to unknown situations, robustness, fault tolerance, autonomous learning and generalization.In this work, we use the Multilayer Perceptron Neural Network (MLP-ANN) which is one of the most commonly applied neural networks models in the literature.We train our MLP-ANN model classifier by means of the famous back-propagation learning algorithm [22] [23].

Proposed Models
The churn prediction model proposed in the study is based mainly on two processes.In the first process a clustering algorithm is applied on the collected customers data in order to separate customers to a number of groups that represent different behaviours.Largest two clusters that represent churners and non-churners are merged again and considered as an input to the next process.On the other hand, the small clusters are neglected as they represent outliers and unrepresentative behaviours.In this research, three different clustering approaches are applied and compared; -means, Hierarchical clustering and SOM.In the second process, MLP-ANN is applied on the clustered and filtered data obtained from the first process in order to learn how to classify customer behavior into two classes; churner customer and non churner customer.Figure 1, represents the two steps model applied in this work.The performance of the final developed MLP-ANN classifier model is assessed and evaluated using different evaluation criteria based on new testing data which was unpresented in the classifier learning phase.

Dataset
Data used in this research was provided by Jordanian Telecommunication Company; one of the major cellular telecommunication company in Jordan.The data set contains 11 attributes of randomly selected 5000 customers subscribed to a prepaid service for a time interval of three months.The attributes cover outgoing/incoming calls statistics.The data were provided with an indicator for each customer whether the customer churned (left the company) or still active.The total number of churners is 381 (7.6% of total customers).

Evaluation Methods
In order to evaluate the developed hybrid models we measure the model prediction accuracy and churn rate based on the confusion matrix shown in Table 1.Accuracy and churn rate are calculated by Equation (1) and Equation ( 2) respectively.

TP TN TP FP FN TN
Churn Rate Accuracy measures the rate of correctly classified instances of both classes and churn rate measures the rate of predicted churn in actual churn.The goal is to achieve high accuracy and churn rate values, high values mean well established model.

Experiments and Results
First, the dataset described earlier is split equally into training and testing parts.Then a clustering algorithm is applied on the training data in order to filter it from outliers and unrepresentative data.In the next stage, an MLP neural network model is developed using the filtered data and evaluated using the other testing data.
In our experiments we tried and compared three different clustering algorithms: k-means, hirarical clustering and SOM.k-means algorithm and hierarchical clustering are applied several times with different k value each time, starting from 4 up to 10 clusters.For SOM, different sizes are also applied as follows (2 × 2), (2 × 3), (3 × 3), and (4 × 4).The starting learning rate was set as 0.20 with variance distance normalization.
In order to develop the classification model, MLP neural network is trained using the filtered data obtained by each of the clustering methods in the first stage and tested using testing data.The parameters of the MLP are tuned as listed in Table 2.The three different approaches shown in Figure 2 are evaluated then compared with the C4.5 decision tree algorithm and the baseline MLP model without using any clustering technique.The results are shown in Table 3. Imperically, the best results by means of churn rate were obtained by setting k to 7 for the k-means algorithm and 6 for the hierarchical clustering and 3 × 3 for SOM.
In general, all the hybrid approaches showed some improvement in the accuracy and churn rate compared to the common approaches; C4.5 and MLP-ANN.The first hybrid model which combines k-means clustering and MLP neural network outperformed the other models in terms of prediction accuracy rate while the SOM clustering with MLP provided the best churn rate value.

Conclusion
One of the data mining goals is to improve the way we understand existing data and predict future behavior.This paper presented three hybrid models to help telecommunication companies to analyze and predict the future behavior of their customers.The three hybrid models are investigated to develop an accurate and efficient churn prediction models in order to help telecommunication companies to predict and analyze the future behavior of their customers.The three models are based on two phases; the clustering phase and the prediction phase.In the first phase, customer data is filtered.The second phase predicts the customer behavior.The first model investigates the k-means algorithm for data filtering, and Multilayer Perceptron Artificial Neural Networks (MLP-ANN)  for prediction.The second model uses hierarchical clustering with MLP-ANN.The third one uses self organizing maps (SOM) with MLP-ANN.The three models are developed and validated using real data provided by Jordanian Telecommunication Company, and the accuracy and churn rate values are calculated and compared.The comparison with the other models shows that the three hybrid models outperformed single model.

Figure 1 .
Figure 1.Proposed model for predicting customer churn.

Table 3 .
Comparison between Three developed models and previous work.