What Causes Non-Performing Loans ? The Case of Greece Using Primary Accounting Data

In this paper we study the effect of independent variables in identifying non-performing loans during crisis period, using a binomial logistic regression. We use a unique data of 2591 loans granted by one of the four systemic banks of Greece in 2005. Specifically we study a sample of loans granted to micro and small enterprises in order to cover working capital needs. Νon-performing loans dramatically increased as the recession of Greek economy deepens. Moreover we prove that in general the variables still affect in the same way the creation of non-performing loans during the studied period. Particularly, binomial logistic regression shows a positive correlation between non-performing loans and factors “Adverse”, “Age” and “LTT”. In contrast, we find a negative correlation between the probability of classifying a loan as non-performing and the independent variables “Collateral”, “Own Facilities”, “Property”, “Residence” and “Years of operation”. Finally the predicted performance of the binomial logistic regression reduced as the crisis deepens.


Introduction
Financial institutions play an important financial intermediation role in the economy: the development of the financial sector influences the level of the economy's development.According to the Basel Committee on Banking Supervision, Bank of International Settlements (2000) defined the credit risk as the potential risk that a bank borrower or counterparty will fail to its obligations in accordance with agreed terms.Credit risk induces financial distress on banks, and its assessment requires advanced modeling techniques that will link it to the sources of uncertainty generated.The control of non-performing loans is a vital priority and necessity for the proper operation of financial institutions.The necessity of reduction of non-performing loans expressed supremely in the recent global financial crisis, which began in the US in 2006 and spread to the rest of the world-especially in European Union, where it evolved into a sovereign debt crisis.The main predictive credit scoring models are distinguished on statistical models (linear discriminant analysis, logistic regression analysis, multivariate adaptive regression splines, etc.) and artificial intelligent models (artificial neural networks, decision trees, case based reasoning, support vector machines, etc.).
The purpose of this paper is to study the behavior of a sample of loans granted by one of four systemic banks in Greece in a period of growth (01-12/2005), during the recent financial crisis and the recession in the Greek economy (12/2010-12/2011).The Greek economy till 2008 showed positive growth rates but then sank into recession.So we use a dataset of loans granted during an expansion period and study their behavior during the recession period.
The sub-prime mortgage lending crisis, which took place in 2006, has shaken the financial stability of many developed countries.As expected, the transmitted financial crisis, received large dimensions, due to the fact that large credit institutions was exposed in the so-called "junk bonds".The financial crisis reached the threshold of Greece in the last quarter of 2008 (Aggelopoulos and Georgopoulos, 2017 [1]).Especially in the case of Greece, the crisis was denatured extensively to debt crisis and received great proportions due to the extremely high public debt, thereby plunging the economy into great recession.Unlike the public debt, private debt in Greece 1 is lower than that of other European countries, as a result of, in general, conservative credit policies of the Greek banks (Table 1).
The paper exploits manually gathered individual-level loan application and loan performance data from 3294 SMEs loans that were granted in the expansion  (2013)(2014)(2015)(2016).In this turbulent environment, the analysis concentrates on the period 2010-2012 since NPLs in subsequent years increased to not-manageable levels due the vulnerable political scene that prevailed that period (so the incurred loan delays that period can be  considered as mechanical).To sum up, the study explores the repayment behavior of the 3294 SMEs loans few years later after they were granted (2005) and throughout the early recession period from December 2010 to December 2011.
The main findings of our analysis show that in general the variables still affect in the same way the creation of non-performing loans.Using binomial logistic regression we prove a positive correlation between non-performing loans and factors "Adverse", "Age" and "LTT".In contrast, we find a negative correlation between the probability of classifying a loan as non-performing and the independent variables "Collateral", "Own Facilities", "Property", "Residence" and "Years of operation".Finally the predicted performance of the binomial logistic regression reduced as the crisis deepens.The rest of the paper is structured as follows.In Section 2 we provide a literature review for credit scoring models.In Section 3 we present the data and the employed variables.In Section 4 we demonstrate the research methodology followed by a presentation of the applicable credit scoring model.In Section 5 we provide the empirical results of this research, while Section 6 concludes the paper and discusses future research.

Literature Review
The recent world financial tsunami unprecedented attention of financial institu-tions on credit risk.A good credit risk assessment method can help financial institutions to grant loans to creditable applicants, thus increase profits; it can also deny credit for the non-creditable applicants, so decrease losses.The accuracy of credit scoring is critical to financial institutions profitability (Wang et al., 2012 [2]).Even 1% of improvement on the accuracy of recognizing applicants with bad credit will decrease a great loss for the financial institutions (Hand & Henley, 1997 [3]).Accordingly, credit risk remains one of the major threats that financial institutions face, and it is essential to model the credit risk of financial institutions.The Basel committee on Banking Supervision proposes a capital adequacy frame that allows banks to measure capital requirement for their banking books using internal assessments of key risk drivers.Thus, there is necessity for systems to assess credit risk for banks (Lin, 2009 [4]).
Credit scoring has four potential benefits in small business lending, namely, 1) reducing screening and monitoring costs of small business loans, 2) enhancing competition among banks in local markets, 3) adjusting lending interest rates commensurate with borrowers' credit risks, and 4) increasing the availability of credit for risky marginal firms.Small businesses are widely viewed as the fountainhead of job creation and the engine of economic growth (Ono, 2014 [5]).Small and very small enterprises constitute 99.6% of all businesses operating in Greece, while the corresponding rate in the European Union amounts to 98.7%2 (Table 2).
Lending to small business is beneficial to commercial banks because the margins on small business lending are higher than on many other bank products (Petersen, Rajan, 1995 [6]; Tsaih et al., 2004 [7]).But it is difficult for bank to obtain detailed information from small firms since the financial reports of small firms are mainly for tax purposes (Bhattacharya, Thakor, 1993 [8]; Diamond, 1984 [9]).In addition, small firms are more subject to economic cycles, and they might be seriously affected by an economic downtown.The credit risk of small business alters because of the change of business environment.Hence, important features for small business lending are to reduce information asymmetry and to avoid the credit risk from lending to small firms (Tsaih et al., 2004 [7]).[10]).In recent years, credit scoring has been become one of the primary ways for financial institutions to assess credit risk, improve cash flow, reduce possible risks and make managerial decision.The accuracy of credit scoring is critical to financial institutions' profitability.Credit scoring was based on "Five Cs of Credit", which is a method used by lenders to determine the credit worthiness of potential borrowers.The system weighs five characteristics of the borrower, attempting to gauge the chance of default: the character of the consumer, the capital, the collateral, the capacity and the economic conditions.
The goal of credit scoring models is to classify loan clients to either good credit or bad credit (Lee et al., 2002 [11]), predicting the bad payers (Lim & Sohn, 2007 [12]).
Many credit scoring techniques have been used to build credit scorecards.Gordy, 2000 [23]), Multivariate Adaptive Regression Splines (Friedman, 1991 [24]).Among them, logistic regression model is the most commonly used in the banking industry due to its desirable features (e.g.robustness and transparency)  [26]).The weakness of the linear discriminant analysis is the assumption of linear relationship between variables, which is usually non-linear and the sensitivity to the deviations from the multivariate normality assumption.
The logistic regression analysis is predicting dichotomous outcomes and linear relationship between variables in the exponent of the logistic function, but does not require the multivariate normality assumption (Sustersic et al., 2007 [27]).
Because of the linear relationship between variables both linear discriminant analysis and logistic regression analysis are reported to have a lack of accuracy (Thomas, 2000 [18]; West, 2000 [22]).On the other hand there are also studies showing that most of the consumer credit scoring data sets are only weakly non-linear and because of that linear discriminant analysis and logistic regression analysis gave good performance (Beasens et al., 2003 [19]).
In recent years many studies have demonstrated that Artificial Intelligence techniques, such as Artificial Neural Networks (West, 2000 [22]; Dimitras et al., 2009 [30]), Case Based Reasoning (Shin & Han, 2001 [31]) and Support Vector Machines (Schebesch & Stecking, 2005 [32]), can be used as alternative methods for credit scoring and show superior prediction accuracy.Among these the neural networks are very promising (Goonatilake & Treleaven, 1995 [33]) and the alternative to the linear discriminant analysis and logistic regression analysis, due to the possible complex non-linear relationship between variables.[35]; West, 2000 [22]).Although the use of these techniques has grown in recent years, they have problems with the results interpretability.These techniques have their weaknesses in their long training process and after obtaining the optimal network's architecture, the model acts as a "black box" and there is not easy to identify the relative importance of potential input variables (Sustersic et al., 2007 [27]).The predictive quality of a credit scoring model can be evaluated based on measures such as sensitivity, specificity, correlation coefficients and information measures, such as relative entropy and mutual information (Baldi et al., 2000 [36]).In fact, we cannot say that there is an overall best statistical technique used in building credit scoring models, so that the choice of a particular technique depends on the details of the problem, such as the data structure, the features used, the extent to which is possible to segregate the classes by using those features and the purpose of the classification (Hand & Henley, 1997 [3]).

The Data Set and the Employed Variables
The data set is collected manually from the internal Management Information System (MIS) of the bank under study and contains a very wide loan portfolio consisting of micro businesses and small enterprises as defined by the EU. 3 The data set contained 2591 loan applications of micro and small enterprises spread across Greece granted in the late expansion period (2005).Specifically we study loans granted in order to cover working capital needs of the enterprises.The present study is based on a joint project between academic researchers with previous professional banking experience and the top level lending management of the bank under investigation.This was carried out due to the necessity of identifying important drivers of credit risk related to borrowers' characteristics and re-evaluating the existing internal credit scoring model of the bank under study during recession.In order to preserve the anonymity of the bank we could not give more details about the data sources.In our analysis, we set as a dependent variable the "performance of the loan" during the studied period.For the definition of a loan as non-performing, we use the basic rules of Basel I & II, where NPLs are those loans that are up to ninety days past due.As a time frame for the identification of the behavior of a NPL, empirical studies (Makri, et al. 2014 [37]; Louzis, et al. 2012 [38]) specify either the performance of loans in a specific month or the performance of loans during a specific period, usually 12 months.In our analysis we check the performance of loan on December 2010 and December 2011.As independent variables we use quantitative and qualitative loan characteristics derived from the loan application at the time of evaluation.In particular, qualitative information (such as the age of the borrower, the type of the loan etc.) is significant in explaining a firm's credit risk (Artavanis et al., 2016 [39]; Gupta et al., 2015 [40]) justified by the "Five Cs of Credit" and used by lenders for credit worthiness evaluation of potential borrowers.In our research analysis, we utilize the nine main characteristics of the credit scoring model used by the bank under study as independent variables (loan characteristics).Table 4 summarizes the definition of these independent variables.

Methodology
The logistic regression analysis was introduced by Joseph Berkson in 1944 [41], who coined the term.The term was borrowed by analogy from the very similar probit model developed by Chester Ittner Bliss in 1934 [42].G. A. Barnard in

Capacity 9 Years
The years of vocational experience.Economic conditions the logit of the probability of the event (Cramer, 2003 [44]).Logistic regression analysis is widely used statistical modeling technique in which the response variable, the outcome (non-performing loans), is binary (0, 1) and can thus be used to describe the relationship between the occurrence of an event or interest and a set of potential predictor variables.In the context of credit scoring, the outcome corresponds to the credit performance of a client during a given period of time, in our case 24 months.A set of individual characteristics, such as age, years of experience, residence status, as well as information about his credit behavior, such as relationship with the bank, purpose, are observed at the time the clients apply for the credit.
A logistic regression model with random coefficients is applied, where the coefficients follow multivariate normal distribution.Here, the event Y t = 1 represents a bad credit, while the complement Y t = 0 represents a good credit.In the model, the probability of individual being "bad" is expressed as follows (Hosmer & Lemeshow, 1989 [45]): where P i = the probability of ith loan being non-performing β k = the coefficient of the kth independent variable x i = the kth independent variable of ith loan The objective of the logistic regression model in credit scoring is to determine the conditional probability of a specific client belonging to a class, given the values of the independent variables of that credit applicant (Lee & Chen, 2005 [20]).
Our binomial logistic regression is as follows: We utilize as dependent variable the performance of loans successively at two specific time points: December 2010 and December 2011.Specifically the dependent variable is a binary variable that takes the value 0 for performing loans and the value 1 for non-performing loans.Regarding the independent variables, "ADVERSE" is a dummy variable taking the value 1 for borrowers who had already experienced adverse at the time of assessing the application, or the value 0 otherwise.The variable "AGE" shows the age of the borrower or the age of the business's owner.The variable "BANKREL" is a categorical variable which shows the relationship between the borrower and the bank.We set four indicators for the type of banking relationship: 1) Borrowers who have no collaboration with the bank before granting the loan (no customer), 2) Borrowers who have already granted loan from the bank and they apply for a new loan, 3) Borrowers who have more than 5.000€ in deposits accounts, 4) Borrowers who already have both deposit and loan collaboration.The "LTT" variable shows the Loan to Turnover Ratio, which measure the percentage of the loan as compared to the turnover of the business.The variable "OWFAC" is a dummy variable that takes the value 1 if the borrower operates in privately owned facilities or the value 0 otherwise.The variable "PROPERTY" is a dummy variable that takes the value 1 if the borrower is owner of property free of liabilities, or the value 0 otherwise.The variable "RESIDENCE" is a categorical variable that shows the residence status of the borrower.We set three indicators for the type of residence: 1) Borrowers who live in a rented house, 2) Borrowers who live with their parents and 3) Borrowers who have a private residence.Finally the variable "YEARS" shows the years of operation of the business.

Empirical Study
In our paper, using as independent variable the performance of loans, as defined above, we try to identify the relative importance of each independent variable as well as the relative influence of each factor in the creation of non-performing loans during the early recession of the Greek economy (December 2010, December 2011).Below, using the binomial logistic regression, we tried to identify the effects of the independent variables in the determination of non-performing loans as well as to track changes in the effects of variables as the recession of the Greek economy deepens.
Table 5 shows that the effect of the independent variables on December 2010.We note that the variables associated with the welfare of borrowers (existence of own facilities, existence of sufficient property free of liabilities, existence of home ownership) as well as years of operation of borrower's business, have a negative effect on the formation of non-performing loans.More specifically, the more years a company operates, the less the likelihood that the customer does not properly pay the loan installments.Factors positively associated with the financial standing of customers are the existence of sufficient property, the existence of privately owned facilities and the existence of home ownership.The analysis confirmed that customers having sufficient property free of liabilities are less likely to develop non-performing loans.Additionally, customers who have owned facilities also have less chance of a disorderly debt service than those who maintain rented facilities.Finally, customer staying in private residence are less likely not to serve their loans than those living in rented house.
On the other side the factor of the borrower's age is positively correlated with the probability of classifying a loan as non-performing.The older the borrower, the higher the probability of the loan to be classified as bad.Similarly, borrowers who had experienced adverse elements before granting the loan, are more likely not to pay their loan obligations.Regarding the factor "loan amount to company's turnover ratio", there is a positive correlation with the probability of the loan to be classified as non-performing.Regarding the factor of previous collaboration between the client and the bank, we observe that customers without a pre-existing deposit collaboration with the bank behave better than those who retained only loan collaboration or they have no collaboration with the bank.Regarding the factor of the collateral of the loan, we observe that, compared to the loans covered only by personal guarantees, loans secured by cash collateral or mortgage on property adversely affect the formation of non-performing loans.On the other hand loans secured by pledging of customer checks noted positive effect on the determination of non-performing loans.
Table 6 presents the binomial logistic regression's results on December 2011.In most cases, the effect of independent variables remain stable.Specifically the variables associated with the welfare of borrowers (existence of own facilities, existence of sufficient property free of liabilities, existence of home ownership) have a negative effect on the formation of non-performing loans.Moreover regarding the years of operation of borrower's business, the more years a company operates, the less the probability of default.On the other hand the borrower's age is positively correlated with the probability of classifying a loan as non-performing.Similarly, businesses which had experienced adverse elements before granting the loan are more likely to be characterized as bad borrowers.Regarding the factor "loan amount to company's turnover ratio", there is a positive correlation with the probability of the loan to be classified as non-performing.Regarding the factor of previous collaboration between the client and the bank, we observe that customers without a pre-existing deposit collaboration with the bank behave better than those who have no collaboration with the bank.Customers who retained only loan collaboration have the worst performance.Concerning the collateral of the loan, we find that setting as base category loans without collaterals, all the other categories performed better.Loans covered by cash collateral have the strongest negative effect on the formation of non-performing loans, followed by loans covered by clients' checks.
Table 7 shows the average accuracy and the estimated misclassification cost of the binomial logistic regression both on December 2010 and December 2011.
We find that the predicted performance of the binomial logistic regression reduced as the recession deepens.Specifically the average accuracy reduced from 90.8% (December 2010) to 81.5% (December 2011).On the other hand the estimated misclassification cost increased from 0.0452 (December 2010) to 0.157 (December 2011).

Conclusions
In this paper we studied the effect of factors that characterize a loan as non-performing, as well as the effectiveness of the binomial logistic regression during the early recession period of the Greek economy.The main studies related to credit scoring models, used up in creating the most comprehensive and effective model, concerning the predictive ability to identify non-performing loans.Regarding the factors that affect the classification of a loan as non-performing, the conclusions are broadly expected.Particularly, binomial logistic regression shows a positive correlation between non-performing loans and factors "Adverse", "Age" and "LTT".Regarding the variable "Adverse", the positive correlation between the probabilities of a loan not served and prior misbehavior borrower is demonstrated.An enterprise that has demonstrated bad trading behavior either to the public sector or to third banks, either to suppliers or customers, has many more chances not to serve the loan.Regarding the variable "Age", it is demonstrated that the greater the borrower, the greater the probability of designating a loan as non-performing.Finally, the variable "LTT", also shows a positive correlation with non-performing loans, with an increasing trend of correlation as the crisis deepens, which is absolutely normal as the higher borrowing proportionately with the dynamics of the business, as depicted in the company's turnover, the more difficult is the ability to serve its obligations.
In contrast, we find a negative correlation between the probability of classifying a loan as non-performing and the independent variables "Collateral", "Own Facilities", "Property", "Residence" and "Years of operation".Specifically, the formation of non-performing loans is negatively affected by the collateral provided ensuring growth within tensity depending on the strength of the collateral (cash collateral, mortgage, client checks), compared with loans secured only by personal guarantees.More specifically, the collateral of the loan is an important element for deciding to grant a loan or not.The posted collateral could be seen as a contribution or exchange of borrower for the credit risk assumed by the bank.The stronger the cover, i.e. the stronger the exchange offered by the client, the less the probability of default of the loan.In addition the financial soundness of a loan involved, is the guarantee for smooth repayment of the loan.Moreover, the existence of private facilities and owner-occupied dwellings reduces the probability that a loan is classified as non-performing, since the fixed obligations of businesses are smaller, giving them the opportunity to be more competitive.The existence of property also negatively associated with non-performing loans, precisely because of the financial standing of borrowers.Finally, years of operation, also negatively correlated with the appearance of non-performing loans.The reason is that companies with a long presence in a sector, on the one hand have secured a certain market share, and on the other hand have created trustworthiness with both suppliers and customers, making easier to cope with the liquidity problems caused by the recession.
Regarding the evolution of the effect of independent variables to identify additional non-performing, we observe that, in general, the intensity of the effect of independent variables is reduced as the recession deepens.This observation indicates that, when the whole economy is in a recession, the characteristics of each business continue to influence the trading behavior but at a decreasing rate.
Additionally, since the percentage of non-performing loans increased dramatically for all categories of borrowers, it turns out that the importance of exogenous factors increases drastically.In other words, in times of recession, the like-lihood of the borrower to default, is largely influenced by the unfavorable economic conditions, thereby reducing the effect of the independent variables.
This study highlighted the evolution of the importance of those factors that determine the likelihood of a loan not repaid at the current recession of the Greek economy, using data for one of the four systemic banks.It would be of particular interest to study whether there are common characteristics with other European countries in the south Europe (Italy, Spain, Portugal, France) in order the ECB to define a representative joint surveillance platform based on the specific and the common characteristics of countries-states of the European Union.In addition, it would be interesting to study the behavior or loans that granted during the deep recession of the Greek economy (2013-2014), in order to check how crisis affect the evaluation process of loan application.
period (from January 2005 to December of 2005) by one of the four Greek systemic banks (i.e. the National Bank of Greece, Piraeus Bank, Alpha Bank and Eurobank), which as oligopolistic players exhibit a similar strategic behavior offering very comparable products.During the span of this study, the Greek banking sector has undergone several development phases with alternating time periods (Aggelopoulos and Georgopoulos, 2017): initially, an expansion period (2002-2007) with very high GDP growth levels, then an unstable period caused by the global financial crisis on September 2008, afterwards the Greek sovereign crisis on April 2010 which led to an accumulation of NPLs in the subsequent years (2010-2012) and finally the deep recession

Table 1 .
The public and private debt in Europe in 2009.

Table 3
presents the main descriptive statistics of the dataset.Almost 50% of the data are unsecured loans.Moreover 41.3% of loans were granted to businesses operating for 5 -9 years.Considering the bank relationship, 703 loans
(27.1%) were granted to new customers.Regarding the performance of loans on December 2010, 90.12% were performed loans and 9.88% were NPLs.The percentage of NPLs rise to 19.45% on December 2011.

Table 4 .
The association of micro and small firms idiosyncratic features with the formation of NPL's.

Table 5 .
Binomial logistic regression-December 2010. .Variable(s) entered on step 1: Years, Owfac, Bankrel, Residence, Age_A, Adverse, Property, LTT, Collat.b.In order to preserve the anonymity of the bank we could not give more details about the data sources. a
a. Variable(s) entered on step 1: Years, Owfac, Bankrel, Residence, Age_A, Adverse, Collat, Property, LTT. b.In order to preserve the anonymity of the bank we could not give more details about the data sources.