The P 2 P Risk Assessment Model Based on the Improved AdaBoost-SVM Algorithm

The improved AdaBoost-SVM algorithm is used to classify the safety and the risk from the Peers-to-Peers net loan platforms. Since the SVM algorithm is hard to deal with the rare samples and its training is slow, rule sampling is used to reduce the classify noise. Then, with the combinations of learning machine, P2P risks can be identified. The result shows that IAdaBoost algorithm can improve the risk platform classification accuracy. And the error of classification can be controlled in 5%.


Introduction
In recent years, owing to the development of the domestic Internet financial business, the traditional financial industry has to reform rapidly.With the global integration process intensified, modern finance is showed a complex form.The complexity of the financial system makes the risk spread faster and faster, and the scope of the impact between the platforms is also growing.
As an important form of Internet finance, Peers-to-Peers loan, the risk infection and measurement are also of concern.Credit risk, which is the main problem faced by the P2P market, is largely associated with the fuzziness of risk factors.Measuring the credit risk is the inherent risk management requirement of P2P and bank market, and it is also an important basis for effective prevention of financial risk.Domestic scholars on the network lending (P2P) were focused on the discussion of its platform operation mode and development trends, as well as network lending (P2P) industry risk control and risk management issues.From the new perspective of "platform risk", we have expanded the research of P2P domain (Ye, Li, & Xu, 2016).Wang mainly analyzes into the P2P network lending platform for risk regulation and prevention analysis and policy considerations (Wang, 2016).Liu analyzes the risk characteristics of China's P2P industry from three different perspectives of lenders, investors and platforms, and constructs an improved debtor risk assessment model (Liu, 2013).Luo Chunyu, when studies the network P2P (P2P) risk assessment, builds quantitative methods and constructs the investor composition analysis model, as well as the borrower credit risk analysis model and multi-information source loan assessment model, supporting the investors to provide decision (Luo, 2012).
The foreign research of P2P network lending platform, mostly analyzes the main behavior of borrower transactions and platform development trends.Considering the current research on the credit characteristics and loan success factors of the main body of the transaction, we mainly analyze the risk problems and the dislocation of the network, and the lack of supervision.This is why China is not as good as Britain and the United States with complete and transparent credit system.What's more, their network lending system (P2P) is developed into the scope of supervision.Compared to foreign, our network lending (P2P) still has to be improved.
In the context of this difference between domestic and foreign, P2P credit risk measurement and evaluation depends on the data screening and model establishment.In the machine learning algorithm, the commonly used algorithm models include perceptron, K-nearest neighbor, Decision Tree, Logistic regression, Support Vector Machine, AdaBoost algorithm, Hidden Markov, Conditional Random Field and so on.The machine learning algorithm is applied to the P2P risk assessment, which can effectively improve the evaluation and classification model.The traditional support vector machine algorithm training problem, in essence, is a convex secondary programming problem.Using the P2P risk measurement and risk assessment, we get the P2P platform indicators data, P2P network loan platform risk division, so as to filter the problem platform.
In view of the simple SVM algorithm, the sample set is required to be high, and the combined learning method generates multiple base classifiers by splitting learning and assembling them according to a certain strategy.The result of the combined classifier depends on the single base classifier.As a result of the determination, the error of the classification can be effectively reduced by the combination characteristics of the various base classifiers.
Boosting algorithm is a commonly used statistical learning method, which is

Statement of the Problem
Because of the huge risk of P2P platform, we focus on how to build the model and measure the P2P risk.As a result, the following article analyzes the P2P risk source and credit evaluation index system, and solves the risk assessment of P2P to avoid investing in bad P2P platform.

P2P Risk Assessment and Credit Index System
P2P network loan platform has faced many aspects of the risk source, including the platform itself and the risk of the risk of infection between platforms.Under the influence of many risk factors, the development and growth of P2P platform will be seriously constrained.
The current net loan is rating and there is no recognized standard and qualification; each rating agencies consider the dimensions and standards, and cannot really reflect the level of a platform.Table 1 shows the evaluation index system used by the third-party rating agencies of each loan platform.For example, 360 large data research institute focus on the background strength of the platform; Dagong international rating report more focused on the debtor solvency of the inspection; Academy of Social Sciences Institute of Finance's evaluation system is focused on the level of risk control platform; Home of the comprehensive evaluation system set indicators and not for security (Yu, 2015).
The evaluation of the network borrowing (P2P) platform can be properly referred to the commercial bank credit rating method.Table 2 below shows a comparison of the rating system for commercial banks.For the commercial bank credit rating, mainly the United States Federal Financial Institutions Regulatory Commission CAMELS rating system, the China Banking Regulatory Commission issued a joint-stock commercial bank risk rating system and Moody's as Table 1.Evaluation index system adopted by the third-party rating agency of net credit platform.the representative of the rating agencies also have a mature commercial bank rating system.Six factors such as capital adequacy, asset quality and management level are summarized by the rating system adopted by regulators and international and domestic authorities, as shown in Table 2.It is of great significance for the research of this paper to summarize the rating system of commercial banks, especially the regulatory authorities (Li, Liu, & Chen, 2015).

SVM Algorithm Improvement
SVM technology mentioned showed below is the base classifier under the P2P network loan platform.The advantage of this method is that the number of classifiers is small, and the algorithm is simple and complicated (Ju, Wang, & Yao, 2012).But there are some drawbacks to this approach: (1) Base classifier learning needs to train all samples, its training is slow.
( In this paper, AdaBoost is applied to SVM classification, and the sample set of each classifier is extracted from the original data set, and the improved Ada-Boost-SVM classifier is obtained by multiple iterations.
(1) Initialization: initialize the same weight for each sample: 1 N ; (2) Adjust the distribution: (4) Calculate the prediction error rate: (5) the Importance of calculating the base classifier: (6) Calculate the new weight vector: In addition, IAdaBoost algorithm is based on the idea of AdaBoost algorithm, in order to avoid the base classifier to ignore the rare class, the initial weight of the sample with the sample size of the class to mark, to get a balanced sample classifier (Chew, Crisp, & Bogner, 2000;Wang & Le, 2005).

Outcome of Practice
Empirical data is from the Network Loan Home Platform (http://www.wdzj.com/),statistics from the September 21, 2016 to the February 21, 2017.It is a total of 6 months of P2P network loan platform data.The results of IAdaBoost-SVM.SVM and AdaBoost-SVM are compared.The parameters σ are taken as the AdaBoost classifier with the fixed parameter value, the parameter value is 6, the penalty parameter C is 100, the dimension of the dimension is 507, the number of iterations of the IAdaBoost algorithm and the AdaBoost algorithm is 5 (Li, Liu, & Chen, 2015).

Correct Classification Chart
From the classification results in Figure 1 and Figure 2, we can see that in the range of small samples, IAdaBoost improved algorithm is higher than AdaBoost and SVM classification accuracy.At the same time, when the sample set reaches more than 1300 when the number of samples, the combination of learning performance is more excellent, in some cases, the test set the correct rate of about 90%.This allows us to correctly implement P2P risk measurement and risk forecasting.
Of course, we can see from the figure, IAdaBoost algorithm to improve the effect of rare data sets more effective.

Predictive Effect
As can be seen from Figure 3 and Figure 4, under normalized conditions, the The Source: Network Loan Home Platform (http://www.wdzj.com/).rate within 5% of the learning process.The final base classifier and its weight are shown in Table 4. Boost 1, Boost 2 and Boost 5 have higher weights, more than 20%; the rest is lower weight.

Conclusion
The IAdaBoost algorithm proposed in this paper not only reduces the training sample, cuts the training range, deals with the unbalanced sample category, but also removes some of the noise data and selects the reliable sample points for training.In addition, the initialization of the improved algorithm can improve the weight of the rare samples, which is beneficial to the correct classification of rare samples.Application of the P2P network loan platform risk assessment can effectively screen out the problem platform, so as to carry out risk management.
Of course, AdaBoost-SVM model also has its shortcomings.Sample sets and training set of data should be more detailed, and there is still room for improvement of sampling methods.In addition, the weights of the initial classification of the algorithm can be preprocessed to improve the processing speed of the model risk calculation.
widely used and effective.In the classification problem, it improves the classifier performance by changing the weight of training samples, combining multiple classifiers, and classifying these classifiers linearly.Applied to SVM, it can be enhanced for the separation and division of the sample set.It can change the probability distribution of training data, and call a weak learning algorithm for a series of training data distributions to learn a series of classifiers.
) Poor treatment of rare classify.Taking into account the above problems, the following selection of sampling is training methods.Training a data set with a subset of the samples can effectively avoid repetitive learning of the entire sample of the base classifier.Its advantages are as below: (1) The basis of the classifier to repeat the study only part of the training sample, its training speed can be effectively promoted (2) Sampling training covers most of the sample data, it can avoid the classifier to ignore the rare class phenomenon.Therefore, P2P platform classification also uses a similar sampling training method to avoid the special platform data caused by the training set of unbalanced problems.
of N training samples.In the AdaBoost algorithm, the accuracy of the base classifier is closely related to its error rate.The initial sample is equal, and in each subsequent iteration, Ada-Boost adjusts the weight on each sample, calculates the error rate of the classifier on the training set, and corrects the probability distribution of the training set.
IAdaBoost-SVM three algorithms to test, we compare the correct classification of the problem platform (the closed platform replaces with 1, the normal platform replaces with 0).

Figure 3 .
Figure 3. Scatter plot and error rate of the sample set A and test set A under the normali-zedAdaBoost algorithm.Source: Network loan home platform (http://www.wdzj.com/).

Figure 4 .
Figure 4. Scatter plot and error rate of the sample set B and test set B under the normali-zedIAdaBoost algorithm.Source: network loan home platform (http://www.wdzj.com/).

Table 2 .
Comparison of the rating system of commercial banks.
Table 3 below is the data classification table for the sample set.Data sets are all equally divided.Obviously, only 6 months is not enough for the training.But with the sampling, we re-established a sufficient sample set.The training set is the first five months of the data, the test set is the last month.Respectively, when using SVM, AdaBoost-SVM and

Table 4 .
Base classifier and its weight ratio.