^{1}

^{1}

The construction of an enterprise financial warning model is very important for a listed company, and this paper uses the financial data of 2819 listed enterprises as a sample, uses the lasso method for model index screening and uses a variety of classical classification methods and machine learning methods to build the model and analyze its discriminating effect. The results show that the lasso method can effectively reduce the multicollinearity between variables while reducing dimensionality and the classification effect of machine learning method is better than the classical classification method.

Financial risk warning is a process of predicting the likelihood of financial failure of a business and sending warning signals, while it uses a variety of mathematical models to make decisions based on a company's financial statements. The market will give special treatment to listed companies with abnormal financial or other conditions, which are also referred to as ST companies and vice versa as non-ST companies. Bradley Efron et al. (2004) proposed the least angle regression to solve the calculation problem of lasso and promote its popularity in the academic world. Hernandez et al. (2009) proposed the use of lasso to select variables and estimate parameters. Li et al. (2015) Logistic regression was used to construct a corporate financial risk prediction model and analyze the probability of corporate bankruptcy.

The selection of variables and indicators will affect the final model, after reviewing the relevant literature, this paper uses lasso regression correlation algorithm to screen the data variables, combining the classical methods of processing cross-sectional data and machine learning methods to build the financial early warning model and compare the prediction effect of the model through three indicators. Regarding the structure of the article: 1) the paper first introduces the basic theory of the methods and models used; 2) the LASSO method is used to screen variables on real economic data, and then different methods are used to model and compare the data; 3) finally, it is concluded that the Lasso method has good results in dimensionality reduction and that machine learning classification is generally superior to classical classification methods.

Assuming that the independent variable data matrix X = { x i j } is an n × p matrix, ordinary least squares regression seeks those coefficients β that minimize the residual sum of squares. As a method of variable selection, lasso regression requires a penalty term to constrain the size of the coefficient, and ultimately minimize the structural risk and prevent the occurrence of “overfitting”.

In the case of the penalty term in the constraint condition ∑ j = 1 p | β j | ≤ s , the coefficient needs to meet the following conditions:

( α ⌢ ( o l s ) , β ⌢ ( o l s ) ) = arg min ( α , β ) ∑ j = 1 p ( y i − α − ∑ j = 1 p x i j β j ) 2 (1)

Due to the characteristics of absolute value, lasso regression will filter out some coefficients. Mallows C p is one of the criteria used to evaluate lasso regression. If p ( k > p ) is selected from the respective variables of k to participate in the regression, then the C p statistic is defined as

C p = S S E P S 2 − n + 2 p ; S S E p = ∑ i = 1 n ( Y i − Y p i ) 2 (2)

Based on this, we choose the model with the smallest C p .

This paper assumes that the dependent variable has two possibilities: the firm is an ST firm or a non-ST firm, which are 1 and 0 respectively. The linear model Y i = β 0 + β 1 X 1 does not meet its assumptions in this case, but Y i is a Bernoulli distribution, so its mean has a special meaning in the model:

P = ( Y i = 1 ) = π i , P = ( Y i = 0 ) = 1 − π i (3)

From this, the Y can be derived:

E ( Y i ) = 1 × π i + 0 × ( 1 − π i ) = π i (4)

The π i in the above formula represents the probability value, which is in line with the basic linear regression, so here you can mostly use logistic regression to fit the model. According to the principle, the following formula is obtained:

P i = f ( β 0 + β 1 X i 1 + β 2 X i 2 + ⋯ + β n X i n ) (5)

Y i can be expressed in another way:

P ( Y i ) = π i y i ( 1 − π i ) 1 − y i (6)

The logarithm of the maximum likelihood function is:

L n L = ∑ i = 1 n y i ln π i + ( 1 − y i ) ln ( 1 − π i ) (7)

π i = exp ( β 0 + β 1 X i 1 + β 2 X i 2 + ⋯ + β n X i n ) Λ + exp ( β 0 + β 1 X i 1 + β 2 X i 2 + ⋯ + β n X i n )

Substitute the upper formula to the following equation:

L n L = ∑ i = 1 n y i ( β 0 + β 1 X i 1 + ⋯ + β n X i n ) − ln [ 1 + exp ( β 0 + β 1 X i 1 + ⋯ + β n X i n ) ] (8)

All the data in this article are from the CSMAR database, CSMAR database is a research-oriented accurate database in the field of economy and finance, which is based on the professional standards of CRSP, COMPUSTAT, TAQ, THOMSON and other authoritative databases, and is the largest financial and economic database with the most accurate and comprehensive information in China. The data selected the financial data of all 194 ST enterprises (hereinafter referred to as ST enterprises) and 3570 unlabeled ST enterprises (hereinafter referred to as non-ST enterprises) as of September 30, 2019. After processing the missing and abnormal values of the data, the final sample data were 33 labeled ST enterprises and 2786 unlabeled ST enterprises.

On the basis of the previous research results, the data variables of solvency, profitability, management ability, development ability and cash flow are selected from five aspects:

· Debt solvency: Reflects the liquidity and debt level of the company's funds, which is conducive to evaluating the company’s financial status and financial risks;

· Profitability: profitability is the main goal of enterprise management also reflects the comprehensive ability of the enterprise, the evaluation of the profitability of the enterprise to a certain extent can reflect the financial operation of the enterprise;

· Management ability: reflects the enterprise to the asset utilization and the management situation, to a certain extent can evaluate the enterprise to maintain and increase the value;

· Development ability: reflects the future of the enterprise's gold management is an important index to predict the development potential of an enterprise;

· Cash flow analysis: dynamically reflects the flow of cash and cash equivalents in a certain period of time. Based on the above considerations, this paper selects 16 indexes around solvency, profitability, operating ability, development ability and cash flow, and draws them into

Most of the samples selected in the papers on the enterprise financial warning model are equal, that is, the number of experimental groups and control groups is the same, so most of them use prediction errors to measure the quality of the model when commenting on the prediction effect of the model. That is, the product of misjudgment and total. However, when the number of different types of variables varies greatly, this evaluation method is not applicable. By consulting the relevant literature, this paper introduces three indexes that can be used to comment on the two categories of variables: accuracy rate, recall rate and F1. Rate.

Symbol | Indicators | Definition | |
---|---|---|---|

Dependent variables | x_{1} | Are ST enterprises | |

Solvency capacity | x_{2} | Current ratio | Current assets/current liabilities |

x_{3} | Quick ratio | (Current assets − inventories)/current liabilities | |

x_{4} | Property rights ratio | Total liabilities/total owners’ equity | |

Profitability | x_{5} | Gross assets net profit margin | Net profit/total asset balance |

x_{6} | Return on net assets | Net profit/shareholder equity balance | |

x_{7} | Net profit/total profit ratio | Net profit/total profit | |

x_{8} | Sales Cost Rate | Sales expenses/operating income | |

Operational capacity | x_{9} | Turnover of accounts receivable | Closing balance of operating income/accounts receivable |

x_{10} | Inventory turnover | Operating costs/end of inventory balance | |

x_{11} | Total assets turnover | Operating income/total assets closing balance | |

Capacity development | x_{12} | Growth rate of total assets | (Total assets end of current period − total assets beginning of current period)/(total assets beginning of current period) |

x_{13} | Net profit growth rate | (Net profit current quarter amount − net profit last quarter amount)/(net profit last quarter amount) | |

x_{14} | Net asset growth per share | (net assets per share end of current period − net assets per share beginning of current period)/ net assets per share beginning of current period | |

Cash flows | x_{15} | Cash content of operating income | Cash/operating income received for sale of goods, provision of services |

x_{16} | Net operating income cash | (Net cash flow from operating activities)/(Gross operating income) | |

x_{17} | Total cash recovery | (Net cash flow from operating activities)/(Total assets) closing balance |

Assuming that the model has four results in prediction, the four results are:

TP: forecast ST enterprises as ST enterprises

FP: forecast non-ST enterprises as ST enterprises

FN: forecast ST enterprises as non ST enterprises

TN: forecast non-ST enterprises as non-ST enterprises

Accordingly, the precision rate P is defined as:

P = TP TP + FP

Recall rates R defined as:

R = TP TP + FN

F_{1} is the harmonic average of accuracy and recall, defined as:

2 F 1 = 1 P + 1 R

F 1 = 2TP 2TP + FP + FN

This paper uses the lars package in R software for lasso regression to screen the financial warning model index. _{3}, x_{4}, x_{5}, x_{8}, x_{11}, x_{12}, x_{13}, x_{14}, x_{15}, x_{16}.

step | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|

RSS | 30.98 | 30.94 | 30.93 | 30.80 | 30.78 | 30.78 | 30.77 | 30.77 |

Cp value | 19.96 | 18.61 | 20.12 | 9.56 | 10.23 | 11.67 | 13.01 | 15.03 |

Variable | x_{3} | x_{4} | x_{5} | x_{8} | x_{11} | x_{12} | x_{13} | x_{14} | x_{15} | x_{16} |
---|---|---|---|---|---|---|---|---|---|---|

Coefficient | 0.001 | 0.011 | −0.026 | −0.002 | 0.009 | −0.021 | 0.007 | 0.003 | −0.003 | 0.005 |

screened variables in Lasso Regression indicates that there is strong multi-colli- nearity between variables; the number of conditions k < 10 after the screened variables in Lasso Regression indicates that the degree of multi-collinearity between variables is small.

before filter | after filter | |
---|---|---|

Condition number | 196.02 | 3.88 |

Method | Sensitivity | Recall rate | F1 value | Average F1 | |
---|---|---|---|---|---|

Classical Linear Discriminant | Logistic Regression | 40% | 24.2% | 30.1% | 22.9% |

Linear Discriminant Analysis | 20% | 18.2% | 19% | ||

Mixed Linear Discrimination | 22.2% | 24.2% | 23.2% | ||

Flexible Linear Discrimination | 20% | 18.2% | 19% | ||

Machine Learning | SVM | 100% | 9.1% | 16.7% | 57% |

Bagging Classification | 100% | 6.1% | 11.4% | ||

Random Forest | 100% | 100% | 100% | ||

Adaboost Classification | 100% | 100% | 100% |

the least effective classification is SVM and Bagging classification for 0.167 and 0.114. In summary, it seems that the classification of machine learning methods is generally better than classical methods, and the accuracy of machine learning methods is generally higher than classical methods, but the recall of SVM and Bagging classification is not as high as that of classical methods in F1 value; from F1 value it seems that the best classification among classical classification methods is logistic regression, and its four classification The F1 values of the methods are not as high as those of the machine learning methods overall, but the differences in classification performance between the methods are small.

Through the analysis of the financial data of 2819 listed companies as of September 2019, the lasso method is introduced to screen the data index, and the model is established by various classical classification methods and machine learning methods. Finally, the prediction effect of each method is compared by using precision rate, recall rate and F1 value, and the following two conclusions are drawn:

1) The collinearity between variables decreases obviously after the model is screened by lasso method, which indicates that lasso method can effectively reduce the multicollinearity between variables.

2) Taking into account that the collected data is unbalanced (non-ST enterprises account for most of the data), the classification effect of machine learning method is better than that of classical classification method. However, the model of SVM and bagging classification is not as good as the classical classification method.

This paper innovatively introduces the LASSO method in a variety of classical classification and machine learning methods to achieve a better prediction effect with a more streamlined model, which can not only be applied to the classification problem but also extended to the regression problem, and provide readers with a reference when choosing a classification method.

This paper is financially supported by National Natural Science Foundation of China (NSFC) under Grant number 71963008.

The authors declare no conflicts of interest regarding the publication of this paper.

Nie, X., & Deng, G. G. (2020). Enterprise Financial Early Warning Based on Lasso Regression Screening Variables. Journal of Financial Risk Management, 9, 454-461. https://doi.org/10.4236/jfrm.2020.94024