Machine Learning Approaches to Predict Default of Credit Card Clients

This paper compares traditional machine learning models, i.e. Support Vector Machine, k-Nearest Neighbors, Decision Tree and Random Forest, with Feedforward Neural Network and Long Short-Term Memory. We observe that the two neural networks achieve higher accuracies than traditional models. This paper also tries to figure out whether dropout can improve accuracy of neural networks. We observe that for Feedforward Neural Network, applying dropout can lead to better performances in certain cases but worse performances in others. The influence of dropout on LSTM models is small. Therefore, using dropout does not guarantee higher accuracy.


Introduction
Neural network can explore the relationship among input features and corresponding labels, so it is suitable for complex machine learning problems. On the other hand, other machine learning models such as linear regression or Support Vector Machine (SVM) [1] can solve simpler problems more efficiently. Therefore, after analyzing specific problems, one should answer the question of "is neural network really necessary in this case?" Moreover, there are different models within the category of "neural network". Feedforward Neural Network uses neurons in the same layer together at the same time to calculate neurons in the next layer. Besides their difference in weights, neurons are "parallel" in this process. On the contrary, Recurrent Neural Network is a useful model for sequential dataset. It allows previous inputs to influence the processing of future inputs. The difference in accuracy between R. L. Liu these two networks should be compared [2]- [8].
We discuss previous research in Section 2, model description in Section 3, dataset description and experiment results in Section 4, conclusion in Section 5 and potential future work in Section 6.

Related Works
Yeh [2] randomly divided 25,000 payment data into a training set and a testing set. Then they chose six data mining methods-Logistic Regression, Discriminant Analysis (Fisher's rule), Naïve Bayes, kNN, Decision Tree and (Feedforward) Neural Network. The error rate of each model on testing set was recorded.
Accuracies were known by using 1 − (error rate). kNN returned the highest accuracy of 0.84. Feedforward Neural Network and Decision Tree both returned second highest accuracy of 0.83. Discriminant Analysis returned the lowest accuracy of 0.74. From this paper, one could observe that neural network is not guaranteed to have better performance than other simpler models, and one of the traditional models, kNN, was able to achieve higher accuracy than neural network. However, they did not apply the technique of dropout on their neural network model. Also, Long Short-Term Memory [3], a model that is widely applied now, was not considered. In neural networks, there is an "epochs" parameter that determines how many times a sample is fed into the model, but this parameter was not included in Yeh [2]. To have clearer comparisons between neural networks and traditional models, a research that includes these factors is needed.

k-Nearest Neighbors
k-Nearest Neighbors (kNN) [4] stores all training samples (including their features and labels) in a space according to its metrics without processing or calculation. When the model receives an object to be predicted, it puts the new object into that space (also according to the metrics). The model then makes prediction by looking at k nearest neighbors to the new object. Usually, the prediction is the label that occurs the most among those k samples.
This model determines a sample's label based on nearby samples with known labels, so it does not "get trained" but only memorizes. To truly train a boundary that separates two categories and can be used for future predictions, Support Vector Machine is a classical model to choose [5].

Neural Network & Dropout
Neural Network consists of an input layer, some hidden layers and an output layer. As shown in Figure 1, the input layer takes features in the dataset as input.
Then, these neurons together are used to compute each of the neuron in the next layer according to the weights of their connections (each bridge between two neurons has its unique weight). Each layer also has an activation function. This function determines the value a neuron passes to the next layer according to the value it receives from the previous layer. The final layer is the output layer.
However, one problem of the Neural Network is that when the number of layers and neurons is large, there could be many connections between neurons.

Recurrent Neural Network & Long Short-Term Memory
Recurrent Neural Network (RNN) can reflect the sequential relationship among inputs. The hidden layer used to process previous inputs is passed to next hidden layers, which are used to process future inputs. Therefore, by training hidden layers, previous inputs can affect how the model processes future inputs

Dataset & Experiments
This dataset is provided by I-Cheng Yeh [2], from Department of Information   In this dataset, 77.88% of samples are negative. While this paper still focuses on accuracies, f1-scores are also be measured as references to guarantee that models are not blindly guessing samples to be negative. If a model has strong tendency to make negative predictions, its recall will be low, so it will return a low f1-score (Tables 1-3).

F1-Score
When the kernel is "RBF" and C = 1, the accuracy, 0.804, is the highest among all results. The corresponding f1-score is 0.4520. The f1-scores of "RBF" kernel is generally higher than the scores of "Poly" kernel.
Random Forest is better than Decision Tree since it reduces overfitting, and both accuracies and f1-scores reflect this. In this experiment, when MSL is 10 or 20, the accuracy is 0.8000, slightly higher than the accuracy of the previous Deci-

Feedforward Neural Network
In this paper, "relu", "softmax" and "sigmoid" activation functions will be compared. There are two layers with the same activation function. The output layer has 2 neurons and "softmax" as activation function, so that the output is a probability distribution.

Feedforward Neural Network without Dropout
In Table 4, numbers on the leftmost column represent the number of neurons in corresponding layers (i.e., "8→8" means a Dense layer with 8 neurons, followed by another Dense layer with 8 neurons). The "Epochs" parameter varies from 1 to 400 for each model, and the value represents the "Epochs" which returns the highest accuracy for that model. These highest accuracies and their corresponding f1-scores and "Epochs" are compared in the following table.
According to the accuracies and f1-scores, "sigmoid" activation function outperforms "softmax" and "relu". For 4 out of 5 cases, "sigmoid" has the highest accuracies. The highest accuracy, 0.8227, also occurs in "sigmoid" when there are 32 neurons in both layers and training samples are fed into the model 117 times.
However, the f1-scores of "sigmoid" models are lower than those of "softmax" and "relu", so for heavily imbalanced dataset, the other two activation functions are better choices.

Feedforward Neural Network with Dropout: (Using Sigmoid)
In this experiment, a dropout function is set between the second last layer and R. L. Liu the output layer. Accuracies and f1-scores with dropout are compared with those without dropout (Tables 5-7). Accuracy & f1-score table for Dense (8)→Dense (8)→Dense (2), first two layers using "sigmoid" activation Accuracy & f1-score table for Dense(16)→Dense(16)→Dense(2), first two layers using "sigmoid" activation Accuracy & f1-score table for Dense(32)→Dense(32)→Dense(2), first two layers using "sigmoid" activation When each layer has only 8 neurons, using dropout causes decrease in accuracies and increase in f1-scores at the beginning, but as the dropout rate becomes 0.3, f1-score decreases too. Dense(16)→Dense(16) →Dense(2) shows better performance after applying dropout. Having higher dropout rate increases both accuracies and f1-scores. When each layer has 32 neurons and dropout is added, the model can still get high accuracies (higher than 0.82) and high f1-scores (higher than 0.45), both are relatively better than other two models.
Therefore, dropout in Feedforward Neural Network can be useful only when there are larger numbers of neurons in each layer. The reason might be that, if the number of neurons is already small, like 8 neurons per layer, dropping neurons and connections could make the model lack of necessary information.

LSTM without Dropout
In the following models, all layers use "sigmoid" as activation function since it returns high accuracies in Feedforward Neural Network. A feedforward Dense layer with 2 neurons is also added after the output layer of LSTM. The "Adam" optimization algorithm is used during training. "Epochs" also ranges from 1 to 400, depending on when each model returns its highest accuracy (Table 8).
According to the results, LSTM models have lower accuracies than Feedforward Neural Network. Using "sigmoid" activation function, 3 Feedforward Neural Network models have accuracies higher than 0.82, but among these five LSTM models, none of them have accuracies higher than that. Also, there are no observable improvements of f1-scores while using LSTM models.

Conclusions
Traditional machine learning models are only able to achieve accuracy of 0.8040, which is achieved by SVM. The highest accuracy of neural network is 0.8246, by using Dense(32)→Dense(32)→Dense(2), dropout rate = 0.1, "sigmoid" as activation function. For LSTM, LSTM(8)→LSTM (8)→Dense (2), dropout rate = 0.3, "sigmoid" as activation function achieves accuracy of 0.8233, which is also better than the best traditional model. Looking at f1-scores, many of neural networks' f1-scores are around 0.44, while Random Forest and SVM using "rbf" kernel can reach 0.45. However, the difference on accuracies is more significant. Therefore, unlike the research of Yeh [2] shown in Section 2, neural networks outperform traditional models, except for situations when the research strongly focuses on positive predictions (False Negative is dangerous and high f1-score is required).
For Feedforward Neural Network, using dropout is sometimes efficient for better performance. The accuracies and f1-scores for Dense(16)→Dense(16) and Dense(32)→Dense(32) are generally improved. Therefore, when using feedforward neural network, dropout can be helpful when the number of neurons per layer is not small.
For LSTM, using dropout does not make significant difference. The accuracies and f1-scores are all close to the results without using dropout. Still, dropout can be applied if one tries to avoid False Negative and focuses on f1-score. Unlike the results of Feedforward Neural Network models, using dropout on LSTM prevents sudden decrease in f1-scores.
A noticeable point is that LSTM models perform worse than Feedforward Neural Network. Generally, people would prefer LSTM and consider it as advanced architecture of neural network, but experiments in this paper show that LSTM get similar f1-scores and even lower accuracies compared to Feedforward Neural Network. Future work is needed to explain this abnormal phenomenon and give a clear boundary of whether to use LSTM or not.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.