A Statistical Analysis to Predict Financial Distress

The aim of this study is to apply the statistical inference to identify if a firm is likely to become financially distressed in the short term. To do this, we decided to collect data from the firms’ financial statements. The analyses performed were based on a group of 45 financial ratios observed from a sample of 86 firms operating in Argentina. First, we used the principal component analysis to turn the information in the 45 original ratios into two new global variables named as ∆Risk and ∆Return. In this way, we can easily represent and compare in a graph the firms’ risk and return variations. By the computation of these new variables it is possible to quickly financially categorize a certain firm based on the risk the company has with regard to the nature of its business and the risk involved in the amount of debt it has taken in comparison to the profits that were generated during the last two fiscal years. Second, we performed a logistic regression analysis to estimate the probability that a firm becomes financially distressed in the short term. The model finally selected managed to successfully identify 85% of the companies from the sample and it explains 65% of the total sample variability. The model is represented by the following variables: 1) Current Debt Ratio, 2) Total Cost of Debt, 3) Operating Profit Margin, and 4) ∆ROE. The outcomes from this study are two tools that were developed based on the statistical inference from which we can quickly asses the financial status of a firm based on its risks and return’s variation as well as to estimate the probability that a firm becomes financially distressed in the short term. There are different ways of taking these tools into practice such as: 1) to control and follow up the financial performance of a company, 2) to support the decision of lending money to a company, 3) to support the decision of investing money or the decision of merging with a company, 4) to support market analysis from a financial perspective, and 5) to support actions or decisions related to the financial assessment of a company that declares itself to be financially distressed.


Introduction
The objective of this study is to identify those companies that have financial problems based on the information contained on their financial statements.With this regard, it is considered that a company has financial problems when it has a high probability of becoming financially distressed in the short term.To do this, we applied the statistical inference to a group of 45 financial ratios observed from a sample of 86 firms operating in Argentina.
In previous similar studies, as for example those proposed by Guzmán [1], Heine [2], De la Torre Martínez [3] or Kahl [4], it was suggested as an objective to find that financial ratio that could better identify a company with financial problems or to find that statistical model that could better predict if a company is financially distressed based on the discriminant analysis.Although all these approaches might be efficient to identify which aspects of a company we should focus on when trying to asses its financial situation, their statistical outcomes would typically not be able to provide a good overview of the firms' overall performance as they are based on just a few variables.This means that with the current statistical models it would be possible to recognize when a company is financially unhealthy but it would be difficult to identify under what circumstances a firm reached that status or even to compare how critical its financial situation is in comparison to other business units or companies within the same industry.Moreover, most of the statistical studies in the current literature do not take into consideration the variation of the firms' financial ratios through the last fiscal periods.Instead they provide a financial diagnosis based on the most recent snapshot of the firms' situation, which might result in wrong decisions being made.
In an attempt to provide a financial study that can cover the issues previously discussed, we decided to combine two statistical analyses with the aim of developing a set of tools that will provide a comprehensive and accurate financial diagnosis of a firm that can be used to take decisions within different business scenarios such as investments analysis, credits offering, and financial management, among others.In this way, we first used the principal component analysis to turn all the data initially collected into two new variables.With this analysis we can obtain a financial overview of a certain firm and we can represent and compare its financial situation based on the risk the company has with regard to the nature of its business and the risk involved in the amount of debt it has taken in comparison to the profits that were generated during the last two fiscal years.Second, we used the logistics regression analysis to precisely determine when a firm has financial problems and to identify those ratios that have a higher influene on its financial condition.
The rest of this paper is organized as follows.In Section 2, we present the sample design by defining its size and composition as well as the criteria used to collect all the data from the firms' financial statements.In Section 3, we define the group of 45 financial ratios that were computed for each company in the sample.In Section 4, the principal component analysis is performed to turn the information contained in the 45 original ratios into a small group of 2 new variables named as ∆Risk and ∆Return.In Section 5, we developed different logistic regression models to estimate the probability that a firm becomes financially distressed in the short term.In Section 6, the tools developed from the principal component and the logistic regression analyses are applied to a new sample.The objective in this case is to evaluate the joint effectiveness of these tools to recognize those companies with financial problems.Finally, the conclusions of the present study together with its possible uses are described in Section 7.

Sample Design
A very important aspect in this kind of statistical research is the sample design from which the statistical models will be developed.For example, if we consider a sample of companies that belong to the construction sector then the resulting statistical model can only be applied to companies of that sector.Also, if the sample is composed by 90% of companies that did not have any financial problems and only 10% of companies that were financially distressed then the capacity of any resulting statistical model to discriminate companies with financial problems will not be significant.Because of these reasons, below we comment all the criteria considered to design the sample which will determine the scope of the analysis.
The sample is composed by 86 firms that operate in Argentina, from which 43 did not have any financial problems (group 1) and the other 43 were financially distressed during the period under analysis (group 2).See Appendix 1 for a complete sample description.
All the information considered in the present study was obtained from the financial statements of each company.In the case of those companies that did not have any financial problem, the financial statements were obtained from the Bolsa de Comercio de Buenos Aires (BCBA).For those companies that had financial problems, the financial statements were obtained from the official reports made by the corresponding receivers that are published by the Cámara Nacional de Apelaciones en lo Comercial.
Different authors from statistical books consider valid to collect at least information from 5 observations for each variable that is included in the statistical model.William Beaver [5] and Edward Altman [6] carried out similar statistical analysis working with a sample of 120 and 60 companies, respectively.In both cases, significant results were obtained and they both considered different models with no more than 5 variables.Therefore, based on these results and considering that in the present study we will not develop any model with more than 5 variables, we can state that a sample of 86 firms is big enough to carry out any statistical analysis.
With regard to the proportion of companies in the sample with and without financial problems, it is not strictly necessary to consider the same amount of observations for each of these groups.However, this is recommended to obtain a better representation of the mean and the deviation of the variables observed in each group.To better understand this issue, we can consider the extreme case of a sample with 1 company that did not have any financial problems and 99 companies that were financially distressed.Based on this sample, when it comes the moment to estimate the probability that a firm becomes financially distressed it is reasonable to think that the corresponding model will have a clear tendency to classify any company as if it is going to have financial problems in the near future.This is because the sample, while not being representative from the population, does not "reveal" the different ways in which a company without financial problems can be found.In other words, the sample contains very little information about the behavior of the variables observed in companies without financial problems, and therefore, it is more difficult for the model to recognize companies from this group.
Another important aspect to consider is the period of time from which the information in the financial statements is collected, especially in the case of those companies that had financial problems.With this regard, the sample considered in the present study includes information from companies that operated during the years 2003, 2004, and 2005.It is important to notice that if this period is too long, for example more than 10 years, then we would run the risk of mixing the financial information from companies that operated in different macroeconomic contexts.If that is the case, then the interpretation of any financial information should be done individually even for companies that operated in the same sector.In countries that have a stable economy, this effect would not introduce a high distortion in the data collected.However, this is not the case of Argentina.In addition, we should notice that it was decided not to include any financial information from companies that had financial problems during the years 2001 and 2002 because during that period there was an economic crisis that affected the normal operations of companies.In this way, we avoid to include in the present analysis any atypical variations that are not the object of study and that could bring distortions into the analysis.We should notice that only for a few companies we decided to consider the financial information from 2002 to be able to compute the variation of some financial ratios over two consecutive periods.In any case, the effect of introducing this information in the study is not significant because in 2002 the amount of companies that had financial problems was significantly lower in comparison to 2001 when the economic crisis was originated (see Figure 1).
In the case of those companies that had financial problems, the required information for the statistical analysis was obtained from the financial statements that correspond to the period during which each company was financially distressed and from the previous period.In this way, we can include in the analysis the evolution of some financial ratios from one period to another.In the case of those companies that did not have any financial problems, the required information was obtained from the financial statements of two consecutive periods, always within the period under analysis of the present study.In similar researches, it was decided to include in the statistical analyses financial information until five periods before the companies were financially distressed.However, these studies analyzed the information from each period separately instead of including in one sample some variables that reflect the evolution of the ratios over two or more periods.The methodology used in these analyses consisted in using the financial information from previous periods as a separate sample to test the discrimination power of a certain statistical model.This model was developed through a group of financial ratios that correspond to the most recent period during which each company was financially distressed.As expected, the results obtained show that as long as the financial information in a sample was more far away in time from the period in which the company was financially distressed then the capacity of the model to distinguish between companies with and without financial problems was diminishing.Therefore, it can be concluded that it is not relevant to include in the analyses financial information from many periods before the companies become financially distressed.This is because by that time companies might show a good financial performance and if this information is taken into account then it will reduce the capacity of the model to distinguish those companies with financial problems.In this sense, it seems more reasonable to focus our attention on the information from those periods where the characteristics of the financial problems become evident in a company, i.e. some years before they become financially distressed.
The companies included in the sample belong to different economic sectors such as industry, commerce, agriculture, and services.The main reason of this choice is to develop a broad statistical model that can be applied in different type of companies.
The financial theory states that it is not convenient to directly compare the financial ratios from two companies that belong to different economic sectors.This is because the economic dynamics in these sectors might differ substantially.For example, a financially healthy company that operates in a certain sector can show a liquidity ratio of 2 while other company that performs a different type of activity can have the same value of this financial ratio and be in financial problems.Therefore, from this perspective it seems not reasonable to include in the sample companies that perform different economic activities.This is because the sample could contain misleading information with regard to those characteristics that allow identifying a company with financial problems, i.e. the relation between the financial ratios and the financial distressed could be distorted.However, we should consider that we are performing a multivariate analysis, and therefore, the characteristics that are observed in each individual are compared in a simultaneous and global way.In this way, it is more difficult that the particular behavior of certain ratios in some economic sectors affect the global profile of a company.Nevertheless, there are two precautions that can be implemented in order to diminish the effect that some characteristics inherit to each economic sector have in the identification of companies with financial problems.The first precaution consists of including in both groups of the sample companies from the same economic sectors.The second precaution consists of having the same amount of companies from each economic sector in both sample groups.Although the second precaution was not implemented for all the economic sectors because of the difficulties to find available financial data, the sample was design to keep the highest balance possible in both groups.
William Beaver [5] designed a paired sample based on companies that operated in different economic sectors.In that sample, for every company that had financial problems there was another financially healthy company from the same economic sector, and whenever it was possible, with the same size.With this regard, we should notice that the size of a company was measured through its total assets.In this way, Beaver performed a univariate statistical analysis, i.e. that the financial ratios of each company were compared once at a time and that the distinction of those companies with financial problems was made through a single ratio with a cut-off value.
In his research, Beaver suggested doing a paired analysis with the objective of quantifying the effect that the economic sectors and the size of the companies have in the identification of those companies financially distressed.In this way, for each pair of companies from the same economic sector and with similar sizes the difference of each financial ratio was computed.Afterwards, these differences were evaluated to determine if there was sufficient statistical evidence that allowed the identification of companies with financial problems.We should notice that because each difference of the financial ratios was determined based on companies from the same economic sector and with similar sizes, the effects of these factors in the sample were mitigated.In addition, it is important to mention that these differences were only computed to quantify the impact that the economic sectors and the size of the companies have on the identification of those companies with financial problems.However, to classify each firm in one of the two groups a limit value from a single financial ratio was considered.This limit value was computed through a direct comparison of the financial ratios, i.e. no differences between the financial ratios were considered.The reason of this is that it is not possible to get any conclusions from a single individual through a paired analysis because always two companies are compared at the same time.
Once the paired analysis is performed, the capacity of each financial ratio to identify those companies with financial problems can be compared to those capacities that are obtained from a statistical analysis based on a global comparison of the companies.With this regard, one would expect these results to be similar as long as the effect of the economic sectors and the size of the companies were negligible.In fact, the findings from Beaver's research support this statement.Therefore, everything seems to indicate that using a paired sample is the best approach to mitigate the possible effects from the economic sectors and the size of the companies.However, we must take into account that the research made by Beaver was based on a univariate statistical analysis, and therefore, each financial ratio was compared once at a time.This means that the effects of these factors when multiple financial ratios are compared at the same time were not evaluated.In this sense, we expect that by simultaneously comparing multiple financial ratios the effects of the economic sectors and the size of the companies should also be mitigated.Therefore, we can conclude that it is not strictly necessary to have a paired sample to continue with our study although keeping a certain balance in the sample can help to diminish the undesired effects of the economic sectors and the size of the companies.
Another precaution that has been considered in the present study to facilitate the identification of companies with financial problems in different economic sectors is the incorporation of a variable that measures the performance of a given company in comparison to the average performance of the sector.More details about the variables considered can be found in the following section.
Finally, another important aspect to be considered in the sample design is the size of the companies.This aspect has already been mentioned when referring to Beaver's research.With this regard, the sample was designed not to include companies with high assets value, i.e. all the companies included in the statistical analysis have assets lower than 500 [Million $AR].The reason of this is that there are just a few cases where big companies suffered financial problems, and therefore, it is reasonable to think that these firms belong to a different statistical population.With this regard, Alexander Sydney [7] suggests that there is theoretical evidence as well as empirical facts that demonstrate that the return rate of a company becomes more stable as the size of its assets increases.This could imply that a firm with a high assets value would have a lower risk of becoming financially distressed in comparison to a middle size or small company even when they both show the same financial ratios values.As a result of this, we could first think that it is not convenient to compare the financial ratios of two companies that differ significantly in its size.Therefore, considering that a consistent statistical analysis requires that all the sample observations come from the same population, we have decided to include companies within a similar range of the assets value in the two sample groups considered.Nevertheless, it is not desirable to have a perfect homogeneity in the sample with regard to the size of the firms because this would decrease the ability of the model to identify those companies with financial problems.

Variables Considered
The selection of the variables that afterwards are going to be used to carry out the statistical analysis is a very important stage of this study.The reason of this is that at this moment we should take into account all those aspects from the companies that we think they could have some relationship with the fact that these firms become financially distressed.In this sense, the selections of the variables together with the sample design define the scope and the applicability of this research.To select the variables considered in this study the following criteria was considered: 1) popularity of some ratios in the financial literature and 2) the performance of some financial ratios in similar statistical analysis.
The statistical analyses presented in the following sections consider a total of 45 variables.The values of each of these ratios were computed for every firm included in the sample based on the criterias described in the previous section.In Appendix 2, we present a list with all the formulas describing each ratio.In order to have a better representation of the selected ratios, we have decided to group them based on the following categories: 1) Liquidity Ratios, 2) Operating Efficiency Ratios, 3) Business Risk Ratios, 4) Financial Risk Ratios, 5) Return Ratios, and 6) Growth Ratios.It should be noted, that we have included a new financial ratio named Benchmarked Return, with the aim of having a measurement that compares the return of each company against the average return of the sector that represents that company.In Appendix 3, we provide the average return considered for each sector that was used to calculate this new ratio.
We should notice that in this particular study we have considered a high number of explanatory variables in order to obtain a comprehensive data base that allow us to develop and compare multiple regression models.Moreover, because we are implementing a principal component analysis there is no need to reduce the number of variables considered in the study, especially if many of them are correlated.

Principal Component Analysis
In this section, we present the results obtained after applying the principal component analysis to the data collected in the sample.To compute the principal components we followed the procedures proposed by Peña [8] and Johnson [9].
After calculating the eigenvalues from the covariance matrix C, we can see that the first two eigenvalues stand for 93% of the total variance (see Appendix 4).Because of this reason, it was decided to work with the first two principal components F 1 and F 2 to represent the sample data.We should notice that these results are significant considering that we managed to reduce the space of representation of the data set from 45 variables to a two dimensional space.
To represent each of the companies from the sample in a unique graph, we calculated the values that each of the principal components take for each firm (see Appendix 5).To do this, we first determined the eigenvectors matrix V.The results obtained are shown in Figure 2. We have represented in blue color those firms corresponding to group 1 (without financial problems) and in red color those firms from group 2 (with financial problems).This representation excludes two outliers, i.e. observations with particular characteristics that deviate from the rest of the sample.We have decided not to consider these outliers to avoid that the scale of the graph is set in such a way that the rest of the companies cannot be distinguished.
Although it seems that there is not a clear distinction between the two groups, the firms from group 2 tend to have higher values of the principal component F 2 in comparison to the firms of group 1.In addition, we can observe a great concentration of companies with a similar negative value of the component F 1 as well as some spread observations from both groups that present higher values of this component.
To continue with the principal component analysis, the correlation between the original 45 variables and the selected principal components were computed.
The results obtained indicate that the principal component F 1 has a high positive correlation with the following variables: X 14 -Operating Leverage, X 41 -ΔDebt Coverage, and X 42 -ΔOperating Profit Margin.This suggests that F 1 reflects two types of risks: 1) the risk that a company has based on how much money it has generated to cover its debt, and 2) the risk of the company's business based on the impact that the sales variations have on the company's profits.Therefore, we have decided to name this principal component as ΔRisk.
A high value of F 1 can be caused by: 1) a high operating leverage, 2) an improvement of the debt coverage, 3) an improvement of the operating profit margin, or 4) a combination of all these alternatives.Nevertheless, we should keep in mind that based on the eigenvectors matrix the variable X 14 -Operating Leverage is the one with a higher influence over F 1 .In this way, we can conclude that those companies that have high values of this principal component will most probably present a high leverage supported by an improvement of the debt coverage and the operating profit margin.With this regard, if we have a look at Figure 3 we can see that those firms that present high values of F 1 with a value of F 2 similar to the sample average show the characteristics previously mentioned.In addition, we should consider those firms that present a high value of F 1 together with a high value of F 2 .In these cases, we could verify that the corresponding companies present a strong decrease in the debt coverage as well as the operating profit margin.Consequently, the high value of F 1 is exclusively due to a high value of the operating leverage.
To summarize the analysis so far, we can state that the firms with a high ΔRisk (F 1 ) only show an improvement of the debt coverage and the operating profit margin when they have a value of F 2 similar or lower to the sample average.In addition, those companies that have high values of both principal components show a high variation of their operations together with a decrease in the debt coverage and the operating profit margin.Therefore, we would expect that a firm with financial problems would show the latter characteristics although these are not sufficient conditions to classify a firm as financially distressed.This means that a company with a negative value of the ΔRisk (F 1 ) does not necessarily need to have financial problems.In other words, those companies that have higher risks in combination with good profits can be considered as financially healthy while those companies that have higher risks but show poor profits will most probable have financial problems in the short term.
In Figure 3, we represent how the firms included in the sample can be differentiated based on the values of F 1 .The yellow bandwidth includes a big amount of companies with a low value of the operative variation while the green bandwidth corresponds to a few companies with a high value of the operative variation.Considering that firms from groups 1 and 2 show low and high values of F 1 , it is difficult to distinguish those companies with financial problems by only having a look at this principal component.However, if we combine this information together with the analysis of F 2 then we will find out that it is possible to recognize certain characteristics from the companies based on the principal components representation.
If we now consider the principal component F 2 , we see that it has a high negative correlation with the following variables: X 33 -ΔNet Income, X 43 -ΔNet Profit Margin, and X 45 -ΔROA (see Appendix 6).In this way, we can conclude that this component is mainly reflecting two aspects: 1) the changes in the ability of a firm to generate revenues, and 2) the changes in the efficiency of a firm to generate revenues.This is the reason why it was decided to name the component F 2 as ΔReturn.
A high value of F 2 can be caused by: 1) a decrease of the net income, 2) a decrease of the net profit margin, 3) a decrease of the return on assets, 4) a combination of all these alternatives.This means that those companies with a high value of this component would most probably show a deterioration of their return.In fact, if we have a look at Figure 2 we can see that most of the firms with a high value of F 2 belong to group 2, i.e. that these companies have had financial problems.In addition, we can see from Figure 2 a small number of firms that show a low value of F 2 although they belong to group 2 as well.Therefore, in these cases we could conclude that the corresponding companies are actually recovering from their financial problems by showing an improvement of their returns.
In Figure 4, we represent how the firms included in the sample can be differentiated based on the values of F 2 .The red bandwidth includes those companies that have shown a high deterioration of their returns while the green bandwidth corresponds to those firms that have shown an improvement in their returns.In addition, we have defined a yellow bandwidth that corresponds to those companies that show a similar value of their ΔReturn that approximates to the sample average.
After performing an analysis of each principal component, we can now combine all the information obtained to define different clusters that can help us to identify the status of a certain firm with regard to its ΔRisk and ΔReturn.This classification of the sample is represented in Figure 5 together with a description of the type of evolution that a company belonging to a certain sector has suffered.We would expect those firms with a higher disposition to have financial problems in the short term to fall into sectors 1 or 2. The sector 1 corresponds to firms showing a significant deterioration on their returns while sector 2 represents companies showing higher risks in combination with a deterioration of their returns.In a similar way, we would expect those firms with a low disposition to have financial problems in the short term to fall into sectors 5 or 6.The sector 5 corresponds to those companies that show signs of stability, low risk and return improvement.In a similar way, the sector 6 is represented by companies that show a significant return increase in combination with higher risks.In the case of sectors 3 and 4 it is not possible to link them to any of the groups considered, i.e. that for those companies falling into these sectors we are not able to make any conclusions with regard to their disposition of having financial problems in the near future.We could say that these companies have a financial situation similar to the sample average.However, we should keep in mind that those companies within sector 4 have higher risks in comparison to those firms from sector 3.
To summarize, we have seen that the results obtained after performing the principal component analysis indicate that this technique has been very useful to achieve a better representation of the firms, especially considering the power of synthesis that it brings by compiling the information contained in the 45 original variables into only 2 new components.By the computation of these new variables it is possible to quickly financially categorize a certain firm based on the risk the company has with regard to the nature of its business and the risk involved in the amount of debt it has taken in comparison to the profits that were generated during the last two fiscal years.In this way, depending on the sector to which a company belongs to it is possible-in some cases-to make an inference with regard to the disposition of this firm to have financial problems in the short term.In the next section, we will perform a logistics regression analysis to develop a statistical model that allows us to estimate the probability that a firm becomes financially distressed in the short term.In this way, we will be able to compute a new quantitative measure that will help us to identify those firms with financial problems.

Logistics Regression Analysis
Because the principal components F 1 -ΔRisk and F 2 -ΔReturn have been useful to represent the firms from the sample and because they hold 93% of the total variance from the 45 original variables included in the analysis, it would be reasonable to use these components to build a logistics regression model.To do this we followed the procedures proposed by Hosmer and Lemeshow [10].In this way, this model would allow us to estimate the probability that a firm becomes financially distressed in the short term, which in the end could be used as a quantitative measure to help us to identify those companies with financial problems.However, the results obtained from the model validation based on the coefficients of determination indicate that the model only explains a small percentage (31.87%) of the behavior of the dependant variable we are trying to estimate: Y -Financial Distress (Y = 1 if the firm IS financially distressed, Y = 0 if the firm is NOT financially distressed).Therefore, we decided to further investigate if it is possible to find a regression model that can better adjust to the data collected.
If we keep in mind that the principal components are actually a linear combination of the 45 ratios considered in this study, we could then make the following question: What would happen if we develop a regression model only with those ratios that are representative of each principal component?The reason of this question is that the variance of each principal component can be negatively affected by the values of some ratios that are not useful to identify those firms with financial problems.This does not mean that the regression model based on the principal components is useless but it brings the opportunity of finding a new model that better explains the behavior of the firms in the sample.
To answer our question, we decided to build a new regression model based only on those ratios that have a medium or high correlation with the principal component F 2 -ΔReturn.In this case, the result obtained from the model validation indicates that this group of ratios can explain 35.63% of the variance of the dependant variable Y -Financial Distress.In this way, we verified the idea that the new model is more efficient to identify those firms with financial problems in comparison to the principal components model.This is because we can obtain similar results but with much more less information.Therefore, following this reasoning, we can state that although the principal components analysis has been useful to represent companies with different financial profiles it is not effective to use these results in a regression model.In fact, we have demonstrated that with a few ratios we can develop a model that manages to identify a similar percentage as the model based on the principal components, which contains data collected from all the 45 ratios.
To summarize, we have demonstrated that in this particular study it is difficult to combine the principal component and the logistic regression analyses.This situation brings us a new problem.It might be the case that there are some ratios that are effective to estimate the probability that a firm becomes financially distressed in the short term but that they have a low correlation with the principal components.To solve this problem, it was de-cided to carry out a global analysis that contemplates the 45 financial ratios included in this study.
It is clear that if we consider all the possible combinations that can be obtained based on the 45 ratios to develop a regression model with no more than 5 variables then it would be very hard to evaluate and compare all these alternatives by trial and error.Because of this reason, we decided to implement a methodology that allows us to reduce the number of models to be compared.This methodology consists in focusing our attention on the first 22 ratios with the highest coefficient of determination based on a regression model with a single independent variable.In this way, the objective is to develop different models only with those variables that by themselves are more effective to identify those firms with financial problems.It is important to keep in mind that this methodology does not guarantee an optimal solution.This is due to the fact that a certain ratio can show a low R 2 in a regression model with a single independent variable but when it is combined with other ratios then the information that brings to identify those firms with financial problems can be much higher.Nevertheless, the methodology implemented is still a valid procedure to find a near optimal solution especially if we consider the high amount of ratios included in the analysis and that many of these variables are correlated.
In Table 1, we present the ranking of the coefficients of determination.From these results, we can see that those variables that had a higher correlation with the principal components are spread all over the ranking.However, we should notice that most of the ratios that are correlated with the component F 2 have a R 2 higher than 0.1.This could be explained by the fact that the parameter value from the component F 2 in the regression model is higher than the component F 1 .In addition, it is important to mention that most of the ratios that can better individually explain the behavior of the firms are related to profitability and return aspects.
Based on the first 22 ratios shown in Table 1, a total of 57 regression models were tested (see Appendix 7).We should notice that we have not included the outliers identified in the principal component analysis when developing any of these logistics regression models.We limited each model to 5 independent variables at most.In addition, the ratios were first grouped based on their correlations to avoid including in the same model more than one ratio that brings the same type of information.For example, it is not reasonable to include in the same regression model only ratios related to liquidity aspects given that we would miss some important financial information from the companies related to aspects such as operational performance, debt, profit, and growth.
The models tested were compared based on the value of the different coefficients of determination.We should notice that usually when some liquidity ratio was included in a certain model then the corresponding estimated parameter was not coherent with the expected behavior of that variable.In other words, we found out that in many of these models a higher liquidity implied a higher probability of the firm becoming financially distressed, which is not coherent with the observed behavior of this variable.This is the reason why some models had to be ignored even when they presented high values for the coefficient of determination.
In Table 2 we present the ratios that belong to the regression model selected as the output for this analysis.This model was mainly selected based on the value of the coefficient of determination but also based on the coherence of the estimated parameters with the expected behavior of each variable as well as the author's judgment with regard to the relevance of the different ratios con-sidered.
To develop this model, we estimated the corresponding parameters through three different methods: 1) least squares, 2) weighted least squares, and 3) maximum likelihood.The results obtained are summarized in Table 3.
where represents the probability that a firm becomes financially distressed in the short term.From this model, we can see that an increase of the current debt ratio or an increase of the total cost of debt implies a higher probability for a company to become financially distressed.In addition, an increase of the ROA, an increase of the operating profit margin, or an increase of the ROE determines a lower probability of a firm to become financially distressed in the short term.In this way, we can verify that the estimated values of the parameters are coherent with the expected financial impact that these ratios should have on a firm.
Y Âs a next step, we performed different tests to validate the logistics regression model obtained as suggested by García [11].We should notice that in all these validation tests we have considered a significance level of 5%.
The first validation test corresponds to the following hypothesis: H 0 ) the model fits the data.To perform this validation, we determined the corresponding statistics through the following expressions: The results obtained are shown in Table 4.We can see that the hypothesis considered is not rejected, and therefore, we do not have enough statistical evidence to prove that the model does not fit the data.
The second validation test corresponds to the following hypothesis: . In this case, the corresponding statistic was determined through the following expression: The results obtained for this validation test are shown in Table 5. Considering that the hypothesis is rejected then we have enough statistical evidence to state that at least one of the estimated parameters in the model is not null.
To continue with the model validation, we performed the significance tests of the estimated parameters.The results obtained through the Wald and Wilks methods are shown in Table 6.
These results indicate that there is not enough statistical evidence to state that the estimated parameters for the variables X 16 -Current Debt Ratio and X 21 -Total Cost of Debt are null.In the case of the variables X 23 -Operating Proft Margin and X 44 -ΔROE, the Wald validation method indicates that there is enough statistical evidence to think that the corresponding estimated parameters are null.However, when we consider the Wilks method the results obtained are the opposite.Therefore, to decide if these variables should be included in the model we decided to calculate the maximum probabilities of rejecting the hypothesis H 0 ) 4 0   and H 0 ) 5 0   when they are actually true.These probabilities are 4 0.1448   and 5 0.0871   , respectively.In this way, given that    these probabilities are quite low, we concluded that there is not enough statistical evidence to think that the estimated parameters of the variables X 23 and X 44 are null.Finally, we need to consider the estimated parameter associated with the variable X 29 -ROA.In this case, the hypothesis H 0 ) 2 0   2 is not being rejected in the Wald validation method nor in the Wilks method.In fact, the maximum probability of rejecting this hypothesis when it is actually true is 0.308 according to the Wald's statistic and 2 7   according to the Wilks' statistic.These results indicate that there is enough statistical evidence to believe that the corresponding variable should not be included in the regression model given that it does not help to identify those firms with financial problems.To verify this statement we compared the regression model that includes the variable X 29 -ROA against that model that does not include this ratio based on the coefficients of determination and the ability of each model to identify a firm with financial problems 1 .The results obtained-as shown in Tables 7 and 8indicate that the additional information provided by the variable X 29 -ROA is negligible, and therefore, we have decided not to include this variable in the regression model.
To finalize with the validation process, we can analyze the results obtained in Tables 7 and 8.The most important thing to notice is the improvement that the model based on the original variables shows in comparison to the model based on the principal components.If we have a look at the coefficients of determination then the maximum value obtained for the model based on the original variables is 0.654 while for the model based on the principal components is 0.3187.In a similar way, the model based on the original variables managed to correctly identify 84.88% of the firms-either as a firm with or without financial problems-while the principal components model correctly identified 78.57% of the firms in the sample.All in all, these validation metrics reflect the robustness of the regression model selected.
Given that from the model validation we concluded that the variable X 29 -ROA should not be considered, the new regression model can be represented as follows: where the parameters corresponding to each financial ratio were again estimated through the maximum likelihood method.As in the previous model, the relation between the estimated parameters and the variables considered is coherent as we can see from Expression (5).
The validation of this new model is quite straight forward since we only left out one financial ratio in comparison to the previous model.As in previous validations, first we tested the hypothesis H 0 ) the model fits the data and we found that there was not enough statistical evidence to reject it.Second, we tested the hypothesis H 0 ) and in this case we found out that there was enough statistical evidence to state that not all the estimated parameters are null.To continue with the validation process we also performed the significance tests of the regression coefficients.The results obtained are shown in Table 9.In this case, we can see that both the Wald and Wilks methods, being the validation results more robust that in the previous regression model validation.
The validation concludes with the calculation of the coefficients of determination and the ability of the model to correctly classify the firms in the sample, which were already presented in Tables 7 and 8, respectively.In this way, we can finish with the regression analysis by computing the 95% confidence intervals for each of the estimated parameters from the selected regression model.The results obtained are the following: To summarize, we have found a logistic regression model based on a reduced group of financial ratios that is defined by Expression (5).The validation results indicate that this model can better explain the total variance of the firms in the sample and that it has a higher ability to identify those firms with financial problems in comparison to that model based on the principal components.In this way, we confirm that in this particular study a big amount of information is lost if we use the principal components to develop a logistic regression model.Nevertheless, we should keep in mind that the principal component analysis has resulted very useful to represent and quickly asses the financial status of a firm based on the risk the company has with regard to the nature of its business and the risk involved in the amount of debt it has taken in comparison to the profits that were generated during the last two fiscal years.In fact, both the principal component and the regression analyses have resulted in two complementary tools that allow us to evaluate and summarize the financial status of a firm based on the data from its balance sheets.

Applying the Analyses to a New Sample
The objective of this section is to evaluate the effectiveness that the principal component and the regression analyses have to identify those firms with financial problems when they are applied over a new sample.
Given to the difficulties involved in the data collection, the new sample is composed by 14 companies from which only 3 of them have had financial problems (see Appendix 8 for the sample details).Moreover, we should notice that the data collected from these firms corresponds to periods previous than 2002, which means that there might be some unusual variation in the data due to the financial crisis that occurred in Argentina between 2001 and 2002.Nevertheless, despite of these data limitations the evaluation performed is still valid although the results will have to be carefully interpreted.
To start with, the values of the principal components F 1 -∆Risk and F 2 -∆Return have been computed for each firm and are represented in Figure 6.From this figure we can see that the 3 companies that have had financial problems are located within sector 2, which corresponds to a high risk level together with a return deterioration.At the same time, most of the companies that did not have financial problems are also located in the same sector with the exception of 2 firms that are located in sector 6, which corresponds to a high level of risk together with a return improvement.In this way, if we would have to classify the firms from the new sample based uniquely on the principal components analysis we would say that all those firms within sector 2 have a higher probability of becoming financially distressed in the short term while the opposite occurs with those companies from sector 6.The higher probability of having financial problems for those companies in sector 2 is mainly derived from the higher risk they have due to the nature of the business-as determined by the operating leverage-and the higher risk they are taking when increasing their debts without generating enough resources to cover it.Nevertheless, in order to obtain a more precise classification we should performed the regression analysis as shown next.
To finalize with the evaluation of the effectiveness of the tools developed, we performed the logistic regression analysis over the new sample and we computed for each firm the probabilities of becoming financially distressed in the short term as shown in Table 10.Based on these results and keeping in mind that those firms with a probability equal or higher than 0.5 are considered to have financial problems, we can conclude that all companies were correctly classified within one of the two groups considered.This suggests that the tools developed are useful and effective to identify those firms with financial problems.Of course, we can always expect some classification error but in this case it seems not to be significant.
It is important to mention how the two analyses performed complement each other.From the principal component analysis we can quickly identify those companies that are taking a higher risk-based on the nature  of the business and based on the higher debts-and to identify those companies that have a better coverage against that risk.From the regression analysis we are able to quantify through a unique indicator-the probability of becoming financially distressed in the short term-how big is the risk involved and how good is the company covering against that risk.In addition, we can use this probability to identify those firms that already have financial problems.

Conclusions
Through this study we managed to verify based on the statistical analyses performed that the financial ratios show a different behavior between those firms that have had financial problems and those which did not.Although not all these ratios have by themselves the same ability to allow the identification of those firms with financial problems, it is possible to combine and summarize all that information into 2 principal components that we have named as ∆Risk and ∆Return.By the computation of these new variables it is possible to quickly financially categorize a certain firm based on the risk the company has with regard to the nature of its business and the risk involved in the amount of debt it has taken in comparison to the profits that were generated during the last two fiscal years.The conclusive results obtained from the principal component analysis suggest that there would be no ap-parent reason not to consider any financial ratio originally collected to estimate the probability that a firm becomes financially distressed in the short term.However, after developing different regression models we have seen that we can obtain better estimations of these probabilities if we just consider a few financial ratios that all together show a higher ability to identify a firm with financial problems in comparison to a situation where the data collected from all the 45 ratios is used (as in the case of the principal components model).In this way, we managed to develop a more efficient model given that we can obtain better results with less data.This efficiency can be explained due to the fact that the principal components are a linear combination of 45 ratios, which means that many of them might not be useful to distinguish between a financially healthy firm and one that it is not.This finding shows how important is to have a complete and broad database before starting any statistical analysis so that fewer limitations are introduced when trying to find a near optimal solution, i.e. the regression model with the available ratios combination that best estimates the probability of a firm of becoming financially distressed in the short term.In the same way, we should emphasis the benefits that can be obtained when combining more than one statistical analysis together to better understand the nature of the process under study and to more effectively achieve the objective proposed, which in our case is to identify those firms with financial problems.
We have seen that those ratios that have more capabilities to identify those firms with financial problems are all related to the return aspects of the companies.In fact, we have seen that the principal component that resulted more conclusive to identify financially unhealthy firms was the ∆Return as opposite to the ∆Risk component.Nevertheless, the information contained in these ratios can always be complemented with information from other type of ratios to identify those firms with financial problems more precisely and effectively.After performing a logistic regression analysis based on the 45 ratios collected in the sample, we have selected a small group of them that can explain 65% of the firms' behavior.The related model consists of the following ratios: 1) Current Debt Ratio, 2) Total Cost of Debt, 3) Operating Profit Margin, and 4) ∆ROE.It is interesting to notice that in most of the logistic regression models tested it was found that there is higher probability to incorrectly classify a firm with financial problems, i.e. to assume that a company is financially healthy when actually it is not.This could be mainly explained due to the fact that the financial ratios collected have a higher variability in those companies that are financially distressed in comparison to those that do not have any financial problem.Nevertheless, the possibility of combining the regression and the principal component analyses helps to reduce the probability of misclassifying a certain firm.With this regard, we should notice that the present study does not include any analysis related to the costs involved in the decision making process of identifying firms with financial problems.Nevertheless, whenever there are not conclusive results that clear define the financial status of a company then the most conservative decision would be to assume that the firm has financial problems.
The outcomes from this study are two tools that were developed based on the statistical inference from which we can quickly asses the financial status of a firm based on its risks and return's variation as well as to estimate the probability that a firm becomes financially distressed in the short term.There are different ways of taking these tools into practice such as: 1) to control and follow up the financial performance of a company, 2) to support the decision of lending money to a company, 3) to support the decision of investing money or the decision of merging with a company, 4) to support market analysis from a financial perspective, and 5) to support actions or decisions related to the financial assessment of a company that declares itself to be financially distressed.
This study could be further developed by trying to incorporate new explanatory variables that are rather not financial ratios but instead qualitative measurements that could contribute to more precise and effective estimation of the probability of a firm of becoming financially distressed in the short term.Another alternative would be to incorporate a tool from which the costs involved in taking the wrong decision-i.e. to assume that a company has no financial problems when it actually has or vice versa-could be minimized.Finally, the statistical analyses performed in this study could be replicated with firms that have a significant amount of assets with the objective of determining the main characteristics that derive in a solid financial structure.As we can see, there are many different ways to continue with this study and the statistics offers interesting tools for that.

Appendix 3
In Table A3 we present the average ROE per industry based on data published on [12][13][14].These average returns have been used to compute the Benchmarked Return ratio for each company in the sample.

Appendix 4
In Table A4 we present the eigenvalues for each principal component obtained through the covariance matrix.We can see from these results that the first two components already accumulate approximately 93% of the total sample variance.

Appendix 5
In Table A5 we present the values that the two principal components selected have in each firm from the sample.Based on these values, it is possible to represent the firms in a unique graph as shown in Section 4.

Figure 1 .
Figure 1.Yearly number of firms financially distressed in Argentina.

Figure 2 .
Figure 2. Representation of the firms based on the principal component values without considering the outliers.

Figure 3 .
Figure 3. Categorization of the firms based on the values of F 1 -ΔRisk.

Figure 4 .
Figure 4. Categorization of the firms based on the values of F 2 -ΔReturn.

Figure 5 .
Figure 5. Categorization of the firms based on the principal components.

Figure 6 .
Figure 6.Categorization of the firms from the new sample based on the principal components.

Table A3 . Average ROE per industry for companies operating in Argentina.
These returns are estimations based on the return of that sector from other years adjusted by the corresponding ∆GDP.