Does Psychometric Testing in Microfinance Actually Work?—The Case of Sogesol

Psychometric testing is claimed to be a powerful innovation in credit scoring. Pioneered by the Entrepreneurial Financial Lab (EFL), this technique would enhance credit decisions by screening out high-risk applicants. This paper aims to evaluate the predictive power of the EFL’s psychometric credit scoring model in microfinance through evidence from Sogesol, a Haitian microfinance institution. This evaluation has been conducted at two different le-vels: 1) A sample of clients has been selected from Sogesol’s database to carry out a back test of the EFL tool, using performance metrics such as the Kol-mogorov-Smirnov (K-S) statistic, the area under the ROC curve (AUC) in comparison with the existing socio-demographic model in use at Sogesol; 2) We conduct an analysis of causality between the quality of the portfolio and the credit decisions made based on the EFL tool and/or the traditional credit scoring model through the estimation of a linear regression model. The results show that the psychometric credit scoring model would present low predictive power in terms of K-S and AUC. However, the EFL tool would outperform the socio-demographic credit scoring model in use at Sogesol. The study further indicates that there would not be any statistically significant relationship between the risk level and the decision of granting a loan or not. The paper concludes that psychometric testing in its original format would not be efficient in the context of Sogesol’s microcredit operations. Thus, the paper develops a new credit scoring model along traditional so-cio-economic and behavioral lines, using logistic regression. This new model presents a better discriminatory power than the EFL tool, regarding K-S and AUC. In addition, it is well-calibrated, considering the results of Hosmer-Lemeshow (HL) test and the Brier score. If properly maintained and inte-grated into the client selection process, this new model could significantly improve credit risk management practices at Sogesol.


Introduction
Risk management constitutes one of the core functions of banks and other types of financial institutions, because risk is inherent in all of their activities. Among the different types of risks they are facing, credit risk may be considered the most important source and the biggest exposure. That is why credit risk management plays more and more a critical role, in that financial institutions have to constantly calibrate the tradeoff between risk and return. After the global financial crisis of 2008-2009 that started in United States with subprime housing loans, a particular focus has been put on credit risk management. It is commonly admitted that one of the causes of the financial crisis was a lack of rigorous credit risk assessment. To address this issue, the Basel Committee and local regulatory authorities made it mandatory for banks and other financial institutions to be equipped with tools that will provide better visibility of credit risk. Credit scoring is considered to be one of those tools. Statistical credit scoring models based on socio-demographic variables are developed to estimate the probability of default of borrowers. This traditional scoring model is largely used by microfinance institutions since their clients are considered very vulnerable, taking into account their lack of collateral that prevents them from accessing conventional bank credit.
The big challenge for those lenders is to find a tool that can really help assess the risk, while increasing the financial inclusion which is one of the Sustainable Development Goals (United Nations, 2015). Hence the importance of the integration of alternative data in credit risks assessment. This can be understood as the motivation of developing models with the incorporation of psychometric variables. Psychometric testing seeks for ways to assess borrower's willingness to repay when she/he has no credit history with the lender, no credit history in a bureau, which is common in microfinance.
The psychometric scoring has been implemented by financial institutions in Africa, Asia, Latin America, in the Caribbean. In psychometric scoring, applicants for loans have to answer a list of questions measuring their intelligence, business skills, personality, ethics, and character as a way to evaluate their willingness to repay their loans. The Inter-American Development Bank (IDB) argued in one of its articles that the implementation of psychometric testing by banks and other financial institutions can reduce defaults by a 20 to 45 percent and a 15 to 30 percent increase in profits, with operational costs of the lending process at less than 40 percent of the cost of traditional evaluation and due diligence (Inter-American Development Bank, 2013).
This argument of the IDB constitutes one of the main motivations behind this paper, in that we are interested to know whether that assumption is confirmed in practice. Hence the title: "Does psychometric testing in microfinance actually work?-The case of Sogesol". tric scoring model implemented at Sogesol (Société Générale de Solidarité) in 2012, the largest microfinance institution in Haiti in terms of outstanding portfolio (more than US$ 40 million for almost 35,000 loans, as of September 2018). As of April 2016, Sogesol had tested 5517 applicants. The psychometric tool was developed and implemented in a microfinance institution where a socio-demographic scoring model had been in place since 2006. That is why the intention of Sogesol was to develop a hybrid model combining the psychometric factors with the socio-demographic variables in order to enhance credit decisions. The paper determines the effectiveness of the psychometric tool by way of analyzing statistical metrics as well as the loan repayment performance.
In order to complete the validation process, the research also reviews the calibration of the psychometric model. The model may be good in terms of discriminatory power, but not sufficiently calibrated. Assessing the calibration requires that clients must be grouped by class of scores. The rule is that a model with a correct calibration would indicate similar default rates for clients belonging to the same class of scores.
Finally, based on the results of the psychometric model review, our purpose is also to look for opportunities to re-estimate/recalibrate the socio-demographic model, so as to enhance financial inclusion at Sogesol with improved visibility of credit risk in the micro-entrepreneurial market.

Problem Statement
In developing countries where the socio-economic conditions are difficult, many low-income individuals seek to create a livelihood by running their own entrepreneurial activity. These activities are of different sizes: micro, small and medium. Hence the concept of Micro, Small and Medium Enterprise (MSME) emerged. They constitute real sources of revenues for their owners. In general, micro-enterprises are not formally structured. They do not have financial statements or accounting systems that provide information on their financial performance. The lack of documented financial performance creates a problem of information asymmetry for lenders. Besides, they are very vulnerable to any social or economic shocks. The owners of micro-activities typically also lack collateral to offer if they want to obtain loans for their business. Consequently, traditional financial institutions are not interested in financing their business, being considered too risky. To meet their funding requirements, entrepreneurs are obliged to borrow from informal moneylenders at exorbitant interest rates that may deplete their working capital and drive them deeper into debt.
This gap in the credit markets has created an opportunity for microfinance as an alternative financing technology. Microfinance is specifically designed to meet the needs of MSMEs by providing tailored working capital funding solutions. One should point out that microfinance is also considered an important economic development tool, created to help eradicate poverty worldwide. Specialized microfinance organizations then emerged that focused on serving un- In order to verify these hypotheses, the paper uses the data of Sogesol on the psychometric scoring and the borrower's repayment performance. A sample of clients is selected, using different definitions of good and bad clients, including the one used in the development of the customized psychometric model. The evaluation of the psychometric tool is realized using statistical methods. Furthermore, since the psychometric tool was deployed in a socio-demographic scoring environment, a comparative analysis of the two different scoring models is carried out, in order to obtain relevant insights. In the end, the paper proposes the re-estimation and the recalibration of the socio-demographic model, using a logistic regression model. R software and Excel are used to conduct the analysis.
More details are provided in the data and research methodology section.

Definition of Credit Risk
In respect of offering credit, there is a common element to take into consideration: the need to study the creditworthiness of borrower or counterparty or the need for credit risk analysis. It means checking whether the prospective borrower is worthy to receive credit (Joseph, 2013). A borrower who is not creditworthy has a high propensity to default on credit.
Credit risk is defined as the potential that a contractual party will fail to meet its obligations in accordance with the agreed terms. Credit risk is also called default risk, performance risk or counterparty risk (Brown and Moles, 2014).

Definition of Credit Scoring
Credit scoring is referred to as the use of statistical models to determine the likelihood that a prospective borrower will default on a loan. Credit scoring models are then largely used to assess business, consumer loans, and so on (Abdou and Pointon, 2011). In addition, credit scoring is defined as the set of decision models and their underlying techniques that help lenders grant loans. These techniques decide who will get a loan, how much prospective borrowers should obtain, and what operational strategies will enhance the profitability of the borrowers to lenders (Abdou and Pointon, 2011).
To develop a credit scoring models, many variables are used. These variables are socio-demographic, financial. Credit bureau data are added to enhance the decision making of extending, mainly for applicants found in the grey area of internal scores formula, taking into account the cut-off score defined. These data are called traditional data. With the evolution of technologies and statistics, other types of data are used to predict borrower's repayment behavior. These types of data are known as alternative data. Psychometric attributes are one type of alternative data.

Psychometric Testing in Credit Risk Assessment
Psychometric assessment is widely used to measure personality traits, knowledge, skills and attitudes. The possibility to screen many people at low cost is seen as one of the benefits of the psychometric assessment. That is why employers use it to select the best-talented employees for their business. It is also demonstrated that the personality dimensions are correlated to the entrepreneurship status.
The success of psychometric assessment in predicting job performance has encouraged the transfer of that method to other areas, such as small and medium enterprise credit and microfinance, where screening applicants is very costly and time consuming. Not only the psychometric tool can help reduce the costs, but also it offers a solution to the asymmetric information while assessing the credit risk of the applicants. The psychometrician made the assumption that there is a personality trait or a set of traits characterizing low-risk versus high-risk loan applicants. The purpose is then to identify those traits and build a measure that has suitable psychometric properties and predictive value. The questions identified by the psychometrician must systematically be tested on real-world loan applicants, and their predictive validity confirmed by best prac-  (Arráiz et al., 2015).  (Arráiz et al., 2018). EFL originally worked with a personality assessment based on the five personality dimensions, also known as the "Big Five" model (Costa Jr. and MacCrae, 1992), an intelligence assessment based on digit span recall tests (a component of the Wechsler adult intelligence scale), the raven's progressive matrices tests (Spearman, 1946), and integrity assessment adapted from Bernardin and Cooke (1993).

EFL and Psychometric Testing
The assumption formulated by the EFL researchers was that these assessments would enable them to identify the two core determinants of an entrepreneur's intrinsic risk: the ability to repay a loan and the willingness to do so. Entrepreneurial traits, measured through personality and intelligence tests, define entrepreneur's ability to generate cash flows in the future, cash flows that can, in turn, serve to repay any loan contracted. Honesty and integrity traits, measured through the integrity test, explain the entrepreneur's willingness to pay, independently of the ability to do so. EFL identified questions that could potentially predict credit risk and tested a first prototype of their psychometric tool. Afterward, EFL developed a commercial application based on the responses to their tool and subsequent default behavior. The commercial application is based on the same quantitative methods applied to generate traditional credit scores, comprised of questions developed internally and licensed by third parties relating to individual attitudes, beliefs, integrity, and performance, in addition to traditional questions and the collection of metadata (indicating the interaction of the applicant with the tool).

R. Sifrain Journal of Financial Risk Management
The EFL application produces a 3-digit score that classifies the relative credit risk of the person who took the test. Financial institutions can apply this score in different ways: for approvals, or modifying the price, size or other margins of loan. allowing them to create and manage their microenterprises. Its goal is to expand financial access to low-income people or to those previously excluded from the formal financial system. It allows them to permanently access quality and affordable financial services in order to finance income-generating activities, to save, to accumulate assets, stabilize their consumer spending and to protect themselves against risks (BRH, 2018).

Microfinance, Sogesol and Psychometric
According to the Central Bank, there are three groups of microfinance institutions (MFI) in Haiti: • Mutualist microfinance institutions or cooperatives: They constitute a group of people, part of a non-profit organization and founded upon the principles of cooperation, of solidarity and mutual support primarily with the objective to collect savings of its members and/or credit granting. They extend loans to members and to individuals as well. They are regulated by the 2002 law on savings and credit cooperative.
• Solidarity credit unions: A solidarity credit union is defined as a group of people with strong ties to each other (Socio professional origin, place of residence, facility, friendship, etc.) that decide to create a fund fed by their contributions, in order to reach a clearly defined goal: the granting of credit to the members of the group on a rotating basis. Unlike the community banks, the solidarity credit unions are independent from the start: operation rules are established by the group itself without the interference of any MFI, even though this one may be an alternative source of funds to supplement the inadequacy of internal resources and provide a technical assistance as well.

Sogesol Overview
Sogesol (Société Générale de Solidarité) was created as a service company for Sogebank, Haiti's one of the largest commercial banks. Its mission is to promote Haitian entrepreneurship by adapting traditional banking services to the needs of micro and small businesses. Sogesol has known some changes in its shareholding.
Since its foundation, Sogesol has significantly grown in spite of tough economic, social and political context of Haiti. Nowadays, Sogesol offers a full range of credit products: working capital finance, agricultural loans, consumer credit and housing microfinance. The majority of its clients are micro and small business owners and agricultural producers. Sogesol has 17 branches, of which seven are in metropolitan zones and 10 in rural areas. In addition, Sogesol has 5 other points of services to better serve the customers. Sogesol is then a national network, of which headquarter is located in Port-au-Prince, the Haitian capital. This performance may be explained by Sogesol's strategy to increase its portfolio outstanding by improving the average amount disbursed. Moreover, it is important to analyze Sogesol's performance with regard to the competition. To do so, we consider the 3 biggest non-cooperative MFIs of the Haitian microfinance sector. The two following graphs present respectively the evolution of outstanding portfolio and number of borrowers of the competition from 2012 to 2018.

Sogesol's Outreach Performance
As observed in Figure   of borrowers might be explained by a shift in its commercial strategy, putting more emphasis on the volume of the portfolio than the number of costumers. It is important to mention that MCC (2025) is specialized in Small and Medium Enterprise (SME) loans, which explains its weak number of borrowers, since it is easier to find a microenterprise than an SME. It is also indicated to underline that except ACME that is an association, all of the 3 MFIs are banks affiliates dedicated to provide microfinance services.

Sogesol's Portfolio Quality
At the end of the fiscal year 2018, Sogesol's portfolio quality has deteriorated compared to 2017. The PAR 30 has passed from 6.6% to 7.98% in one year. The result of the competition is not different from the one of Sogesol. The graph below displays the evolution of the portfolio at risk more than 30 days from 2012 to 2018. Figure 2, all of the 4 MFIs have known deterioration in 2018. ACME presented the worst performance (8.84% in 2018 against 6.5% in 2017), followed by Sogesol (7.98% in 2018 against 6.6% in 2017). MCC displayed the

Sogesol and Credit Scoring
The use of credit scoring began at Sogesol in 2006, with the technical support of Accion International. The goal of Sogesol was to implement its first credit scoring model while launching a new type of working capital product dedicated to the most vulnerable borrowers. This product is known as a nano loan, meaning for a smaller amount than even a microloan (≤$US 500). Sogesol became then the first Haitian financial institution to incorporate scoring in its loan process.
With the introduction of the sociodemographic scoring model in its operations, Sogesol mainly aimed to improve customer service by accelerating the loan approval process, maximize efficiency of collection activities, and improve portfolio quality.
Based on the performance of its first sociodemographic credit scoring, Sogesol decided to extend the use of credit scoring to microenterprise working capital loans. Sogesol has then developed four credit scoring models: two models to assess new borrowers (1 for nano loans and 1 for microenterprise working capital loans) and two models for repeat borrowers (1 for nano loans and 1 for microenterprise working capital loans). In addition, Sogesol has also initiated in its credit process, the psychometric testing.

Sogesol and Psychometric Testing
In 2012, Sogesol and EFL teamed up to incorporate psychometric credit scoring into the credit process. Sogesol signed an agreement with EFL for the development of the psychometric credit scoring model. Sogesol was interested to test whether the EFL tool could help it enhance its credit approval process. Since Sogesol had several years of experience in implementing traditional credit scoring models, the objective was to develop a hybrid credit scoring model (socio-demographic/psychometric), in order to increase the predictive power of the existing credit scoring model. MSME's owners who applied for a working capital of US$ 1000 and more have been screened by the EFL as part of the application process. On average, the EFL application took 61 minutes to complete.

Implementation Phases
The implementation of the psychometric credit scoring model at Sogesol consists of the following phases: • Adaptation of this model to the local context; • Development of a semi-calibrated and a calibrated model; • Testing the psychometric model alone. This first level of analysis is followed by the back testing of the EFL calibrated model. This part contains two components: 1) the assessment of the EFL model discriminatory power and 2) the assessment of the calibration of the EFL model. Since the psychometric model was deployed in a socio-demographic scoring environment, we produced a comparative analysis of the two different scoring models, for each good and bad definition adopted by the paper.
2) Assessment of the pilot of the EFL model In order to test the ability of the psychometric scoring model to mitigate risk, Sogesol has conducted a pilot from May 2015 to June 2016, where the priority in granting new loans was given to the psychometric score. For this purpose, a threshold score was defined. Based on that, commercial strategies were defined.
Since the EFL score had not been applied yet alone to show its predictive capacity, a loan was rejected if and only if it received the rating F in both traditional model and psychometric model. Sogesol wanted to make sure not to lose potential clients by basing its decision-making exclusively on the EFL. The methodology applied to carry out the analysis is explained in the section 4.4.2.
Excel and R Language are the two tools used to conduct the evaluation.

Definition of the Psychometric Model Development
A client is defined as bad if his/her Days Past Due was over 30 after 6 months on Book (Bad30MoB6). Otherwise, the client is qualified good.

Alternative Definitions of Good and Bad Clients
In order to reinforce our analysis, besides the main definition indicated above, we used 5 other definitions. They are as follows: 1) A bad client is the one who had more than 60 days late after 6 months on book (DPD60MoB6). Otherwise, s/he is defined as good.
2) A bad client is the one who had more than 90 days late after 6 months on book (DPD90MoB6). Otherwise, s/he is defined as good.
3) A bad client is the one who had more than 30 days late after 9 months on book (DPD30MoB9). Otherwise, s/he is defined as good.
4) A bad client is the one who had more than 60 days late after 9 months on book (DPD60MoB9). Otherwise, s/he is defined as good.

5)
A bad client is the one who had more than 90 days late after 9 months on book (DPD90MoB9). Otherwise, s/he is defined as good.
All of these definitions are used to evaluate both psychometric model and traditional model.

Back Testing of the Psychometric Model
Assessing the validity of a predictive model is a very critical task. There are two methods that are generally used to measure the performance of a predictive model: discrimination tests and calibration methods. This section aims to evaluate the psychometric model, using those two methods. Prior to that, the score distribution is analyzed.

Discriminatory Power of the Psychometric Model
Discrimination assesses a model's ability to correctly classify clients. In other words, it measures the capability of the model to separate good clients from bad ones. There are several tests to achieve the assessment. But the paper is interested in the following ones: is a non-parametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K-S test), or to compare two samples (two-sample K-S test) (https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test). The K-S statistic measures a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null hypothesis distribution of the K-S statistic is that the sample is indeed drawn from the reference distribution in the one-sample case, or that the samples are drawn from the same distribution in the two-sample case.
The empirical distribution function F n for n i.i.d (independent and identically distributed) ordered observations X i is given as follows: The K-S statistic for a given cumulative distribution function F(X) is: where sup x is the supremum of the set of distances.
In the field of credit risk management, the K-S test quantifies maximum vertical separation (deviations) between two cumulative distributions (good and bad clients) in credit scoring modelling. In other words, this statistic measures the degree of discrimination between good and bad clients. The result of the test can be between 1 and 100, where the higher the K-S, the better the discrimination.
Using the Bad30MoB6 as definition, the K-S of the psychometric model is 19.67%.
To confirm the performance of both psychometric and traditional models, we used several alternative definitions to calculate the K-S. The results obtained for each definition is presented in the table below.
As illustrated in Table 1, the psychometric model would present a better predictive capacity than the traditional model. The max K-S of 40% is reached by the psychometric model with the Bad90MoB6, while the traditional model displays a 22% K-S. The lowest K-S is observed for both psychometric and traditional models in the Bad30MoB9, which is respectively 16% and 11%. Whatever the DPD (30, 60, 90) used, with the MoB6, the two models perform better than with the MoB9.

2) Receiver Operating Characteristic (ROC)
A Receiver Operating Characteristic curve (ROC curve) is a graphical plot that shows the diagnostic ability of a binary classifier system as its discrimination threshold is varied (wikipedia.org). The ROC curve is built by plotting the true positive rate (TPR) or sensitivity against the false positive rate (FPR) or (1-specificity) at various threshold settings.
The ROC is one of the methods used in credit risk management to determine the discriminatory ability of a credit scoring model. Let us consider C as a cut-off providing a simple decision rule to divide clients into potential good and bad. As indicated in  If a client with a score above C is identified as bad client or a good client has a score below C, the prediction of the model is then correct. Otherwise, the credit scoring model makes wrong prediction. The proportion of correctly predicted bad clients is named Sensitivity and the proportion of correctly predicted good clients is named Specificity.
It is very important to remember that the false positive prediction (1-Specificity) is known as type I error, defined as the error of rejecting a null hypothesis that should have been accepted. The false negative (1-Sensitivity) is known as type II error, i.e. the error of accepting a null hypothesis that should have been rejected.
From a risk perspective, Sensitivity denotes those cases which are both actually bad clients and predicted to be bad clients as a proportion of total bad cases.
Specificity indicates cases which are both actually good clients and predicted to be good clients as a proportion of total good cases (Abdou et al., 2016).
The In the case of Sogesol, with a Bad30MoB6 definition, the ROC for respectively the psychometric model and the traditional model is as follows.
As observed in Figure     On the other hand, even though the difference is not significant when comparing the sensitivity and the specificity of the two models, we can observe that the psychometric model is better with a sensitivity (0.42) and specificity (0.67) larger than the sensitivity (0.41) and the specificity (0.65) of the traditional mod-Journal of Financial Risk Management el. In fact, for a given level of specificity, the model with the higher sensitivity is preferred. As well, for a given level of sensitivity, the model with the higher level of specificity is preferred (Abdou et al., 2016).

Calibration of the Psychometric Model
As illustrated previously, the discrimination tests inform about the ability of the   Notes: For reason of confidentiality, the range of the EFL scores are not displayed, they are replaced by number from 10 to 1, where 10 represents the class with the higher scores and 1 the one with the lowest scores. Notes: For reason of confidentiality, the range of the EFL scores are not displayed, they are replaced by number from 10 to 1, where 10 represents the class with the higher scores and 1 the one with the lowest scores.

2) Brier test
The Brier score is referred to as an overall goodness-of-fit check for a model predicting binary or categorical response values (Brier, 1950 where ij p denotes the forecast probabilities, ij Y takes the value 0 or 1 according to whether the event occurred in class j or not and r defines the possible classes (r = 2 for default and non-default). So, the Brier score is defined as the squared difference of the predicted probabilities ij p and the observed default rates within each category. The Brier score takes on a value between zero and one. The lower the Brier score of a model, the better is the predictive performance. The Brier score incorporates elements of both discrimination and calibration, since it compares numerical outcomes (in the case of a binary result, 0 and 1) to predicted probabilities, without the grouping used by the other calibration techniques (Fenlon et al., 2018).
The Brier score was calculated for the sample under study with the different definitions of good and bad clients, in order to measure the accuracy of the psychometric model implemented at Sogesol. The table below presents the results of the test: Surprisingly, as shown in Table 6: the best Brier scores are obtained for the traditional model for every definition of good and bad clients. The maximum Brier score attained is 0.36 (Psychometric model) and 0.12 (Traditional model), while the minimum reached is 0.33 (Psychometric model) and 0.06 (Traditional model). As observed, the difference between the Brier scores of both models is relatively large and the Brier scores of the traditional model are closer to 0. We should point out that the best Brier scores (0.06 and 0.07) of the traditional model are obtained with the definitions (Bad60MoB6 and Bad90MoB6) for which we also obtained the higher AUC values (0.6 and 0.6).

Psychometric Model and Risk Reduction
The following table displays the DPD rate for clients accepted under both models and clients accepted by the traditional model and rejected by the EFL model according to different definitions of DPD after months on books. This allows us to measure the contribution of the psychometric model in helping the traditional model screening out high risk potential borrowers. Therefore, we expected to obtain a better DPD rate for clients accepted under both models.  number of days past due after 6 months on book for each client i (continuous variable) or a variable equal to 1 if the client i had more than 90 days past due after 9 months on book, and 0 if the client i was less or equal to 90 days past due after 9 months on book (binary variable); i x is a variable equal to 1 if the client i was rejected by the EFL score and accepted by Sogesol's traditional score and 0 if the client i was accepted by both psychometric model and socio-demographic model; i  is the regression error term; and S is defined as the sample of clients under study.

As shown in
The outputs of the linear regression of each DPD definition with the binary variable for the adverse psychometric test are presented in the table below. Table 9, there would not be any relationship between the risk

Discussion of Results
Based on the different definitions of good and bad clients and the scores, the Nonetheless, the model can give an indication of the future potential value of using psychometrics in microfinance. Also, the production of loans in the category of Sogesol's clients studied here is rather small, with less than 40 new loans on average disbursed monthly. We should also note that the pilot was not con-

The Data
The data used to develop the model for new working capital clients is collected from Sogesol's database. The sample contains 5,776 loans disbursed from October 2012 to December 2017, more than five years of historical data on repayment behavior of new clients that are sufficient to develop a credit scoring model. The dataset is comprised of 23 variables (numerical or categorical). They are described in the following table.
The variables described in Table 10 are part of Sogesol's loan application.
However, some of them have not been used yet in the existing credit scoring models of the institution under study. Those variables are: Phone number, Sogesol Awareness, Credit experience, Bank account, Bank account, Sogebank savings account and Client reference. Their choice in the sample is due to the fact that they could likely be predictive of borrowers' repayment behavior. Indicator variable that marks 1 when the client has more than 90 days after 9 months on books, and 0 if otherwise.

Binary
Lastly, we should mention that the numerical variables such as age, enterprise since, monthly sales, etc. were discretized to obtain the different bins, using percentiles or quartiles as for obtaining the interval cut-offs.

Definition of Good/Bad Client
With regards to the development of the new model for Sogesol, this research adopts the following definition of good/bad client: • A bad client is the one that has more than 90 days past due after 9 months on books (DPDMOB9 > 90). • A good client is the one whose days past due are less or equal to 90 after 9 months on books (DPDMOB9 ≤ 90) The DPD90MOB9 turns out to be considered an optimal definition, since among definitions tested, it would provide a more performing model. In addition, the choice of DPD90 could be explained by the fact that Sogesol considers loans as nonproductive (in default) after 90 days past due.

Scorecard Development Methodology
On the subject of developing a scorecard model to predict the repayment behavior of loan borrowers, several mathematical techniques are available. Among others, we can mention: logistic regression, decision tree, random forest, neural networks, and so on (Siddiqi, 2006). This research adopts one of the most common, successful and transparent Journal of Financial Risk Management technique, i.e. logistic regression.

Logistic Regression
Like most of other predictive modeling techniques, logistic regression uses a set of predictor variables to estimate the probability of a particular outcome (Siddiqi, 2006). The equation for the logit transformation of a probability of an event is displayed as follows: where: p = posterior probability of "event", given inputs; In the case of a scorecard, the event is set to "bad" and the non-event to "good".
Logit transformation is defined as the natural logarithm of the odds, i.e. ln (p(bad)/p(good)). It serves to linearize posterior probability and limits outcomes of estimated probabilities in the model to between 0 and 1. To estimate the regression coefficients 1 β to β k , maximum likelihood is used. These parameters determine the rate of change of logit for one change in the input variable (adjusted for other inputs). In other words, they represent the slopes of the regression line between the target variable (good/bad) and their respective input variables 1 x and k x . It is important to underline that the parameters depend on the unit of the input. In order to facilitate the analysis, they have to be standardized.
Regression requires a target or response variable and a set of independent inputs. Different forms of inputs exist. However, the method commonly used is to consider the raw input data for numeric variables and create binary variables for categorical data. To counterbalance the effects of input variable units, standardized estimates are used in the analysis.
In order to obtain the best possible model using all options, regression can be run. It is usually referred to as "all possible" regression techniques. The three following techniques are generally used in logistic regression (Siddiqi, 2006): required can be set to be added to the model, or to be kept in the model.
In this research, the stepwise logistic regression method is used to build the model for assessing new applicants for a working capital loan at Sogesol. To do so, R language is used as a statistical tool to run the regression.

Training/Validation Samples
In the scorecard development process, the total sample should be divided into training sample and validation sample. The training sample serves to develop the scorecard and the validation sample is held aside to test the scorecard obtained.
Based on the definition of good/bad client above, the whole sample of Sogesol is divided as indicated in the table below.
As shown in Table 11, 75% of the whole sample is used to develop the scorecard and 25% for validation. The distribution of bads is almost the same for the different samples (training = 11.1%; validation = 10.5% and total = 11.0%).

Scorecard Model Results
The credit scoring model developed to screen new applicants is based on a binary logistic regression, where the target variable is the good/bad repayment classification as defined previously. The following table displays the results of the model: As observed in Table 12

Score Calculation
Once we have the estimates of each variable, we can calculate the scores. The Journal of Financial Risk Management  Using p good , the score is calculated as follows: Score *1000 good p = So, the higher the score, the less risky is the new applicant.

Model Performance Assessment Discriminatory Power of the Model
We assess the discriminatory power of the above model for new working capital clients at Sogesol with the same metrics used previously to evaluate the psychometric model developed by EFL for Sogesol.

1) Model performance-training sample
The KS, AU values indicate that the model is robust. The outcomes are shown in table below: As shown in Table 13, a KS of 32.59% indicates that the model has the ability to discriminate between good clients and bad clients. In other words, based on that result, the model can separate goods from bads. In addition, with regards to the respective benchmarks established for AUC, we would attest that the model developed performs well. Finally, the model outperforms the psychometric model, since, over all, the metrics of this traditional scorecard built with some new variables included are greater than those obtained in the psychometric assessment. The ROC curve below supports the robustness of the model: As observed in Figure 4: the ROC curve displays enough lift above the 45˚ line of a useless model, indicating the predictive power of the model. 2) Model performance in the validation sample No matter how good the model may appear within the training sample, if it is not the case in the test sample, the model will not be considered reliable in predicting new applicants' repayment behavior. As shown in Table 14, the results obtained for the validation sample are almost the same as those observed for the training sample. These indicators are also within the good range of benchmark respectively.
Compared to the results obtained for both psychometric model and traditional model from the previous back testing, the new model is shown better.
The predictive power of the model is illustrated in the ROC curve below: As shown in Figure 5: like in the graph of the training sample, the ROC curve of the validation sample is far enough from the line of a useless model to confirm the robustness of the new model.
It is important to highlight that KS and AUC values for the validation sample were not obtained from running another stepwise forward logistic regression on the validation sample, but by applying the model previously estimated to the Journal of Financial Risk Management reserved validation sample and then stepping through various cut-points to create the ROC curve.  That observation is reinforced by the outcome of the Brier test. In fact, a Brier score of 0.09 is found, suggesting that the model is well-calibrated. The Brier score gives a result between 0 and 1, the lower the score, the better model.

Models Comparison
It is very useful to compare the predictive power of the new traditional credit scoring proposed by this paper and the EFL tool evaluated in the previous chapter. The following table illustrates the comparison: As indicated in

Discussion of Results
The outcomes of the back testing of the psychometric model and the existing traditional scorecard used at Sogesol for new borrowers have suggested the construction of a new scorecard as an alternative. To do so, we have selected a sample of historical data from Sogesol's information system. The dataset has been extended to more than five years, containing enough data to build the predictive model.
Several alternative definitions of good/bad clients have been tested. The retained one (Bad90MOB9) has been found to be optimal, since it displays the best statistical performance both in terms of discriminatory power and calibration.

Conclusion
The paper has evaluated the predictive power of psychometric testing in microfinance, based on evidence from Sogesol. The evaluation contains two parts: one consists of a back testing, using statistical tools (ROC curve, AUC, K-S) with different definitions of good and bad clients, to measure the predictive power of the psychometric model used by Sogesol to assess the creditworthiness of MSME's owners applying for a working capital loan of US$ 1000 and more. For this purpose, a sample of 3671 clients has been selected from Sogesol's database.
Using R software and Excel, we analyzed the performance of psychometric testing in comparison with the existing traditional working capital scoring. On the other hand, the paper analyzed a pilot conducted with a sample of 250 borrowers, where Sogesol used psychometric testing alone to grant loans. This empirical analysis of the research helped to measure the contribution of the psychometric credit scoring model in risk reduction at Sogesol. Moreover, the analysis of the pilot results indicates that the EFL tool would not contribute to mitigate risk in Sogesol's portfolio, confirming the hypothesis of the paper stipulating that the psychometric model has no significant impact on Sogesol's portfolio quality. This second hypothesis has been tested within a linear regression model, where the dependent variable is defined as Days Past Due and the independent variable is a binary variable set equal to 1, if the client was rejected by EFL score and accepted by the traditional score, and set to 0 if the client was accepted by both models. Overall, the binary variable is not statistically significant in any linear regression with the different DPD variables.
Therefore, the DPD would not be affected by the credit decisions, while the objective of Sogesol was to strengthen the predictive power of the socio-demographic model by experimenting with the psychometric tool, by developing a hydrid model.
In this case, we found that the psychometric testing as applied by Sogesol at the time presents low discriminatory power. Based on this outcome, the paper proposed to re-estimate the existing socio-demographic credit scoring model,