The Prediction Model of Financial Crisis Based on the Combination of Principle Component Analysis and Support Vector Machine

This paper studies financial crisis of listed companies in China Manufacture Industry, and selects 181 companies with financial crisis and 181 normal companies as its research samples, and its research is based on financial indexes three years before the financial crisis happens. Firstly the method of principle component analysis is used to abstract useful information from the training data. Secondly a prediction model of financial crisis is constructed with the method of Support Vector Machine and the accuracy of the model is 78.73% on the training data and the 79.79% on the testing data. Thirdly the advantages of this model are discussed over the other prediction models. Finally the research results show that this model uses the least number of input variables and has the highest prediction accuracy, thus this model can provide the useful information to investors, creditors, financial regulators and etc.


Introduction
As the rapid development of economy globalization, the environment the enterprises are facing with is becoming more complicated, and Financial Crisis often reduces the enterprises to financial distress, and even bankrupt.Therefore, financial risk management has been becoming more important, and financial crisis warning is becoming the research focus.Accurate prediction and financial crisis analysis are not only the objective requirement of market competition, but the necessary condition of sustainable development for enterprises.The financial crisis does not take place abruptly and it evolves gradually.Therefore, it can be predicted.What is more, it is very important for government department to predict the financial crisis of enterprises correctly because it can monitor the enterprises' quality and the risk of securities market, and safeguard the interest of inventors and creditors.
This paper has five parts, and the second part reviews the related literatures of financial crisis warning, and the third one introduces the prediction model of financial crisis based on the combination of principal component analysis and support vector machine, and the fourth one describes the modeling and the discussion, and the fifth one gives out the conclusion of this paper.

Related Works
In thirties of twentieth century, many researchers had done research works on the prediction model of financial crisis.[1] did pioneering research on univariate model.He selected 19 companies as his samples, and compared and analyzed the financial indexes of healthy companies and companies in financial crisis, and he found that the variables, whose discriminant ability are used to describe the financial crisis, were two ratios, where the first one was net profit divided by stockholders' equity and the second one was stockholders' equity divided by liabilities.[2] proposed one more mature univariate model.He randomly selected 79 successful companies and 79 unsuccessful companies to predicate the financial crisis based on univariate model.He found that the predication ability of the ratio of cash flow divided by gross liabilities is the strongest, and the second one was asset-liability ratio.As univariate model only uses individual ratio to predicate financial crisis, its advantage of univariate model is simple and practical.But production activities and operating activities of enterprises are influenced by many factors, therefore we might get contradictory results when we use different financial index to predicate.Thus the univariate model has been replaced by multivariable model.
[3] used Multivariable Discriminative Model (MDM) to study the warning for company financial crisis.He selected 33 bankruptcy companies and 33 non-bankruptcy companies to build Z-Score model.The application of multivariable model is very easy, but the prerequisite for the model is that the independent variables presented normal distribution and the covariance of two sample group is equal, and actual sample data do not meet this requirement, therefore the application range of multivariable model is confined.
[4] built a financial crisis prediction model with the method of Logistic Regression, and found that company size, capital structure, performance and current cash ability have remarkable prediction ability.Logistic Regression Model (LRM) is built on the base of cumulative probability functions, and it did not require that independent variables follow multivariable normal distribution, and overcame the limitation that linear equations were subject to statistical hypothesis.But LRM is sensitive to multicollinearity, and when the correlation among explanatory variables is very high, the minor change of samples will bring about the maximum change of coefficient estimation, thus it will reduce LRM prediction effect.
With the development of Artificial Intelligence (AI), the algorithms with the ability to learn and reason have been used in financial crisis prediction, and some satisfactory results have been achieved.[5] introduced Neural Networks Algorithm (NNA) into financial crisis prediction, and selected five Altman financial crisis, and proved that the discrimination accuracy of NNA was higher than one of MDM.But Neural Networks Method is built on the experiential risk minimization principle, and during the period the algorithm runs, it might be trapped in local minimum, therefore the global optimal solution might not be found.By the way NNM is likely to over fit the training data.
Support Vector Machine (SVM) is proposed after neural network method and it is based on Structure Risk Minimization, and it overcame the two disadvantages, and the first one is slow convergence rate of gradient descent method and the second one is that it might be trapped in local minimum.In its application, SVM shows good performance.[6] used SVM method to predict financial crisis, and he found that SVM was better than NN, multivariable discriminative analysis and Logistics Regression Method.
Research on Financial Crisis starts later in China, and foreign methods are used to construct Chinese warning model.[7] firstly introduced analysis indexes and warning model about business failures in China, and used statistics methods to do quantitative analysis.[8] used 62 companies in the accounting database Compustat PC Plus from 1977 to 1990 to construct F fraction model.[9] selected 27 ST companies and 27 Non-ST companies in 1998, and used the finance report data from1995 to 1997, and did univariant analysis and multivariate linear discrimination analysis.[10] used three methods of Fisher linear discrimination, multivariate linear regression, and Logistic Regression, and constructed three prediction models of financial crisis, and examined the prediction accuracy before financial crisis took place.[11] selected cross-section financial indexes of 120 Listed Companies as their modeling samples and 60 companies as their test samples of the same year, and constructed warning model of financial crisis with the tool of BP ANN.[12] proposed a new prediction method of non-linear combination based on BP ANN, but to remove the influence of spanning two years, he constructed warning model based on the data two years in advance, and his result was similar to the result of Shue Yang.
Based on these research works, we can find three characteristics.When researchers use SVM model to predicate financial crisis, they focus on the selection of kernel function and financial index, and few researchers use Principal Component Analysis (PCA) method to extract useful information from financial indexes, and input this information into SVM; As study year is so long ago and samples are very few from many industries, all these may influence correct interpretation on predication accuracy and discriminative results; As Chinese researchers use Special Treatment (ST) companies 1 and normal companies to do research and they use the annual reports two years before the companies were specially treated, the prediction ability might be exaggerated.
This paper uses the combination of principal component analysis (PCA) and Support Vector Machine (SVM) to predict financial crisis.We use PCA to extract several factors that play important roles to financial crisis, and these factors are used to train SVM model, and the trained model is tested on 94 samples.The final factor number is five, and the kernel function is polynomial, and the prediction accuracy of the combination model is 79.79% three years before one company becomes ST Company.[13] proposed SVM theory to solve pattern recognition.The core of this theory is to find a hyper plane to separate sample data, and different class of samples locates at different side of the plane.[14] proposed a new method to reconstruct the kernel space by using the inequality constraint, and solved partially the problem of linearly inseparable problem, and this opens up a way to the application of SVM.Grace, Boser and Vapnik, etc. (1990) did research work on SVM technologies, and gained some breakthroughs.[15] proposed the statistics learning theory, and well solved linearly inseparable problem, and formally laid the foundation for SVM theory.

The Warning Model Based on PCA and SVM
Although several papers have used SVM to predict financial crisis, we do not find the paper which combines PCA and SVM to predict financial crisis yet.As there is some linear dependence among input variables (parameters) to some degree, using the original variables may influence the classification effeteness.Thus we use PCA to abstract the components which are not linearly related, and input these components into SVM.This flow is displayed in Figure 1.
Financial crisis prediction model combining PCA and SVM is formed based on the previous flow design.The algorithm is displayed as follows.
1) According to the index architecture of financial crisis warning, we process the training data, and set PCA parameters, and get PCA model.  1 The companies are with two consecutive annual losses or their NVPS (net asset value per share) is lower than the par value of stock.

Selection of Samples and Warning Index
As China has put ST regulation into practice, most researchers use ST as the standard of financial crisis, we have selected ST listed manufacturing companies on Shenzhen stock market and Shanghai stock market from 2001 to 2011, and rejected B stock companies and ST companies which have other abnormal conditions, and companies whose data is not complete or unreasonable.Thus we have got 181 ST companies in all, 13  Corresponding with ST companies, we have selected 181 healthy companies based on the rules of the same industry, the same fiscal year, and similar asset size.Thus we have selected 382 listed companies, and 268 companies from 2001 to 2007 are used as training samples, and 94 companies from 2008 to 2011 are used as test samples.
As China Securities Regulatory Commission (CSRC) judges whether the financial condition of a listed company is abnormal based on operating performance in its annual report during two consecutive years, using the annual report two years before a company is specially treated may exaggerate the prediction ability of the model.Therefore, this paper uses ST companies three years before they are specially treated as research samples to judge whether these companies will have financial crisis.
The selection of financial warning index has a very significant impact, and scientific index not only has better classification ability, but simplifies the prediction model.Many researchers use financial index to do empirical researches on financial crisis warning.Although different researchers may have different financial index, these financial index include basically solvency, profitability, and operation ability and etc.Based on important literatures and predecessors' researches on index architecture of financial crisis warning, and the availability and suitability of data, this paper selects 23 financial indexes from five aspects, and these aspects include solvency, profitability, and operation ability, growth ability, and the ability of obtaining cash.These indexes are displayed in Table 1.

Principal Component Analysis
Whether the sample data are suitable for PCA can be examined with the method of KMO and Bartlett's sphericity test.The test result shows that KMO statistics (0.704) is more than 0.5, and Bartlett's sphericity is 4941.569,and the significance (0.000) is less than 0.05.This shows that the correlative coefficient matrix has significant differences and these sample data are suitable for PCA.
According to Table 2 about the eigenvalues and contribution rate of principal components, the fore seven eigenvalues are more than 1, and accumulation contribution rates have reached 72.466%.Thus according to the standard that the eigenvalue is more than 1, these seven principal components are used to replace the original financial indexes.

Constructing SVM Model
According to SVM, we need to input some parameters into SVM model.These parameters include the type of kernel function, penalty coefficient C, and etc.At the default condition (Kernel function is RBF and C is 10), There are four types of the commonest kernel functions.In this paper, in order to construct a better prediction model, we try different combinations of kernel function and parameters, get the accuracies on the training data and test data, and select the nearly most accurate combination as the optimum choice.The result is displayed in Table 3.
According to Table 4, when we select polynomial as kernel function, where penalty C is 100, the accuracy of SVM model is the most accurate on the training data as well as on the test data.
As there are four kinds of kernel functions and 23 principal components to select, we try each combination of kernel function and components number to find the most accurate combination.The combination results are displayed in Figure 2.
From Figure 2, we can find that the classification accuracy with polynomial is higher than the other three types of kernel functions in most cases, and its accuracy is the highest when component number is five, and the accuracy is 79.79%.
To inspect the meaning of every factor explicitly, we use the method of rotating principal components so that every factor maximizes at some direction.We get component matrix in Table 5, and this table shows the coefficient of every factor on every input variable.
Through our inspection, we can find that 1 f is composed of Net Return on Assets, earnings per share, growth rate of net asset, net assets per share, total assets growth rate, return on equity, net profit margin on sales, main business profitability, asset-liability ratio, and net profit growth rate, 2 f is composed of cash recovery for   total assets, Operating Cash Flow Per Share, Cash current liability ratio and Sales Cash Ratio, 3 f is composed of quick ratio, current ratio, and cash ratio, 4 f is composed of total assets turnover，current asset turnover， inventory turnover ratio，and accounts receivable turnover, and 5 f is composed of main business's increasing rate of income and main business profit growth.
We use SVM model to compute the factor importance, and find that 5 f ranks first, 3 f second, 2 f third, 1 f fourth, and 4 f fifth.This shows that development trend is predominant, and the ability of profit growth is very important, and the rapid cash ability is important to some degree.

Model Comparison
We use training data to train SVM model without PCA, and use the trained model to classify the training samples and test samples.Then we compare the results between SVM models with and without PCA.The comparison result is displayed in Table 6.
From Table 6, we can find that SVM model without PCA over fits training data and the accuracy on ST companies is very low.Two accuracies are very similar on training data and test data when using SVM model with PCA to classify, thus as a whole the results are very equilibrium.
We compare the new prediction model with Artificial Neural Networks Model, Bayesian Model, Logistic Regression Model, and the comparison result is displayed in Table 7.   From Table 7, we can find that the accuracies of all models have been improved with PCA, but the accuracy increment of SVM is the largest.Therefore the combination of PCA and SVM can get better result.

Figure 1 .
Figure 1.Financial crisis prediction model combining PCA and SVM.

Figure 2 .
Figure 2. Classification Accuracy of SVM model with different component number and kernel function.

Table 1 .
Original financial indexes of financial crisis warning model.

Table 2 .
The eigenvalues and contribution rate of principal components.

Table 3 .
Training result and rest Result.

Table 4 .
The classification result of four types of kernel functions

Table 5 .
Matrix of Principal Component.

Table 6 .
Accuracy Comparison between SVM models with and without PCA.

Table 7 .
Prediction Result Comparison among different models.