Machine Learning has undergone a tremendous progress , which is evolutionary over the last decade. It is widely used to make predictions that lead to the most valuable decisions. Many experts in economics use models derived from Machine Learning as important assistance, and many companies would use Neural Network, a model in bankruptcy prediction, as their guide to prevent potential failure. However, although Neural Networks can process a tremendous amount of attribute factors, it results in overfitting frequently when more statistics is taken in. By using K-Nearest Neighbor and Random Forest, we can obtain better results from different perspectives. This paper testifies the optimal algorithm for bankruptcy calculation by comparing the results of the two methods.
Since the discovery of the very first machine learning algorithms, machine learning has unprecedentedly developed. New models emerge out of obscurity. Business companies nowadays are struggling to determine models for a better prediction. As a consequence, there has been a growing interest in data mining models, which are capable of predicting results from a huge amount of data.
K Nearest Neighbor (KNN) is an algorithm that classifies all available statistics based on a similarity measurement ( Saravanan , 2010). First introduced in the 1970s, KNN is commonly used in statistic estimation and pattern recognition fields (Jóźwik, 1983). The NN rule is first called “minimum distance classifier” or “proximity algorithm” by Sebestyen, an expert in neurocomputing field. Essentially, it is based on a fundamental principle called Ockham’s razor: a heuristic guide for scientist to build theoretical models. Conceptually understandable, KNN is able to solve complex tasks with relatively small amount of statistics.
Random Forest (RF) is another classification method based on decision tree. The first related algorithm is called “random subspace method”. Created by Tin Kam Ho (Ho, 1995), it is able to reduce the correlation of the estimators by training the model with random samples. An extension of this algorithm is created by Leo Breiman who was a statistics professor at University of California Berkeley. He combined several methods and constructed a collective decision tree (a tree-like model that graphs possible consequences) (Beriman, 2001). Instead of growing one decision tree, random forest generates ensemble of unique trees and rolls out the most popular class by voting. This makes random forest one of the most accurate algorithms. However, this accuracy has the problem of overfitting which occurs when the model predicts well using the training dataset but performs unsatisfying with some noisy classification tasks.
Neural Networks (NN) is a computing system inspired by animal brains that can improve the model performance progressively as the model is trained. It is mainly used in such fields as image recognition and social networks (Nielsen, 2015). Neural Networks consist of a collection of artificial neurons and connections are made to transmit signals among them. A typical model has several layers which perform different transitions, where the signal would travel from the input layer to the output layer and bypassing many hidden layers. Neural Networks can fit into any functions regardless of its linearity, but, as this paper will discuss, it is often used in cases where easier solutions would perform better.
Economy is a major part of our society, and we rely heavily on businesses that form our economy. However, major problems such as over-speculation, the belief that the value of one company will always get better, harm the entire system. Thus, a way of modeling a company’s performance and predicting its potential bankruptcy rate is necessary, since it would allow people to notice and decide whether they will invest in or quit the company.
Several models have already been established to predict the rate of bankruptcy, but many of them rely on a large amount of databases and perform poorly. In another scenario, these models use more complicated algorithms for simple calculations. For example, the neural networks, which have been increasingly popular in the recent decade, perform a high accuracy, especially with lots of hidden layers and the help of dropouts (to randomly ignore some nodes in the hidden layer). However, when it comes to relatively simple tasks, overfitting occurs constantly because of such accuracy. Therefore, K Nearest Neighbor along with Random Forest and Neural Networks are tested in the paper to find out a better algorithm that can avoid such problem.
In this algorithm, let the point labeled be x, and label the point closest (k = 1) or numerous points closest (k = n) to x be y. When there is a vast amount of data x, y will possibly be the same. For example, when a baised coin is tossed (the chance of getting one result is higher than the other) one million times, the result is nine hundred thousand times heads. Predictably, the next toss is very likely to be head. In this case, KNN uses a similar method.
C( x )=Y( 1 )
In
The classifier C assigns x to its closest neighbor Y depending on the value of k (in this case k = 1). As the size of data set increases, the error rate would be guaranteed to be less than twice of the Bayes error rate (The minimum error rate based on the data’s distribution).
P * ≤ P ≤ P * ( 2 − ( C / ( C − 1 ) ) P * )
Above is the formula for obtaining the tight error boundary. P* is the Bayes error rate, C stands for the number of classes, and P is the nearest neighbor method’s error rate. For example, when there is a large amount of data, Y is the nearest neighbor, and X needs to be classified, it is very likely for X to be classified with Y. The chance of getting an error from this classification will be greater than the minimum rate based on the population and less than two times that error rate, which is shown in the formula above.
Random Forest can obtain a very accurate and precise result by planting huge amount of decision trees.
Training sets would be selected at random and be randomly distributed along X and Y vectors.
Then, tree bagging method (bootstrap aggregating) will choose a sample with replacement at random.
In
For example, when selecting for B times, tree bagging would have B training samples from X and Y called Xb and Yb. Next, it will make a decision tree fb based on Xb and Yb. Therefore, after these trainings, the sample could be predicted from the mean number of the trees, with the formula:
f ^ = 1 B ∑ b = 1 B f b (x ′)
The majority vote would be the result of the classification. This method strengthens the performance of the model by limiting the variances without increasing the degree of bias. In the other hand, in the case of training with few and noisy datasets, tree bagging the model would perform better by decreasing the correlation between the trees.
Random Forest differs in the learning process for each individual tree. It uses a modified learning process called feature bagging. Different from tree bagging, feature bagging allows each decision tree to have a random subset of features. By doing so, it is able to intentionally prevent some significant predictor features and therefore make them correlated.
A basic neural network consists of three parts, as shown in
We can view the model as f ( x ) : x → y
f ( x ) = K ( Σ W i G i (x))
K in this equation is also known as the activation function, which is usually a defined function, such as the hyperbolic tangent function. The activation function helps address the values as the input layer change.
Each layer has its weight and the weight changes as the model goes through the hidden layers until it reaches the output layer. The components of each layer are completely independent of each other. As
In another task, researchers from University of Toronto including Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov introduced the dropout method to neural networks. Normally, neural networks contain many hidden layers to express complicated relationships. However, when it is applied to test data with noisy training data, the models don’t perform as well as that for the training datasets. Dropout is one of the methods designed mainly for reducing the amount of computation and decreasing the chance of overfitting. It is proved that dropouts can effectively help a large model to achieve these goals by intentionally ignoring random units of random layers (Srivastava et al., 2014).
As shown in
Bankruptcy prediction has become very popular during the last decade. Myoung-Jong Kim and Ingoo Han have done a research on the same topic with different models. They used the same database with three different approaches for bankruptcy prediction, and focused especially on the efficiency of quantitive data mining for representing the experts’ precision. The first approach is to establish quantitative models for a data mining understanding. This method specifically uses classifiers made up of a set of weights within the economic variables, such as discrimination analysis and neural networks. The second approach is to automatically screen bankruptcy prediction which rules out of the vast amount of population. The last approach is to use subjective models for data mining. This represents the experts deciding subjectively because they weight things differently and they take subject things into account. According to Kim and Han’s work, Neural Networks is capable of extracting experts’ decision rules out of their quantitative bankruptcy decisions (Kim & Han, 2003). In this paper, we aim at getting a better bankruptcy prediction through a quantitative approach. Different models such as KNN and RF are used to compare with the neural network model used in past papers.
As shown in
Data Set | Data Size | Training set percentage | Test set percentage | Validation set percentage |
---|---|---|---|---|
Bankruptcy Rates | 250 | 80% | 10% | 10% |
Technique | Data Size | Rules Extracted | Overall Accuracy |
---|---|---|---|
Genetic Algorithms | 232 | 11 | 0.940 |
Inductive Learning | 232 | 16 | 0.897 |
Neural Networks | 232 | 12 | 0.903 |
K value in
K value in
K value in
As shown in
K | Truncate | Accuracy |
---|---|---|
3 | 50 | 0.995 |
3 | 150 | 0.98 |
5 | 50 | 0.985 |
5 | 150 | 0.97 |
7 | 50 | 0.98 |
7 | 150 | 0.94 |
Truncate | Accuracy |
---|---|
50 | 0.975 |
150 | 0.99 |
Truncate | Accuracy | Loss |
---|---|---|
50 | 0.984 | 0.1015 |
150 | 0.98 | 0.0344 |
Dropout Rate | Truncate | Accuracy | Loss |
---|---|---|---|
0.5 | 50 | 0.955 | 0.1332 |
0.5 | 150 | 0.98 | 0.0312 |
0.3 | 50 | 0.995 | 0.0606 |
0.3 | 150 | 0.97 | 0.0366 |
Overall, the three algorithms used (KNN, RF, and NN) perform a higher accuracy comparing to the three approaches used by Myoung-Jong Kim and Ingoo Han. Kim and Han focused too much on achieving different approaches to get an expert’s decision. However, they neglected the problem of overfitting. As shown in Figures 5-7, KNN performs the best when K is 3 with truncate of 50 because the dataset is relatively small. Specifically, a smaller K value provides a more precise prediction. Similarly, RF performs a high accuracy too. As reflected in
rate. However, more future research should be done to improve these models. For instance, autoencoders are methods that help reduce the dimensions of the dataset, which can further reduce the amount of calculation. It is especially useful for tasks that require large calculations, and it has the potential to be used widely for unsupervised learning and many other fields yet to be studied.
Zhang, W. H. (2017). Machine Learning Approaches to Predicting Company Bankruptcy. Journal of Financial Risk Management, 6, 364-374. https://doi.org/10.4236/jfrm.2017.64026