<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JDAIP</journal-id><journal-title-group><journal-title>Journal of Data Analysis and Information Processing</journal-title></journal-title-group><issn pub-type="epub">2327-7211</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jdaip.2021.93010</article-id><article-id pub-id-type="publisher-id">JDAIP-111009</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject><subject> Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Research on Personal Credit Evaluation Based on Mobile Telecommunications Data
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Shaoyong</surname><given-names>Hong</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yan</surname><given-names>Zhang</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Chun</surname><given-names>Yang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Guangzhou Huashang College, Guangzhou, China</addr-line></aff><aff id="aff2"><addr-line>Guangdong Teachers College of Foreign Language and Arts, Guangzhou, China</addr-line></aff><pub-date pub-type="epub"><day>08</day><month>07</month><year>2021</year></pub-date><volume>09</volume><issue>03</issue><fpage>151</fpage><lpage>161</lpage><history><date date-type="received"><day>1,</day>	<month>July</month>	<year>2021</year></date><date date-type="rev-recd"><day>30,</day>	<month>July</month>	<year>2021</year>	</date><date date-type="accepted"><day>2,</day>	<month>August</month>	<year>2021</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  With the rapid development of big data technology, the personal credit evaluation industry has entered a new stage. Among them, the evaluation of personal credit based on mobile telecommunications data is one of the hotspots of current research. However, due to the complexity and diversity of personal credit evaluation variables, in order to reduce the complexity of the model and improve the prediction accuracy of the model, we need to reduce the dimension of the input variables. According to the data provided by a mobile telecommunications operator, this paper divides the data into a training sets and verification sets. We perform correlation analysis on each indicator of the data in the training set, and calculate the corresponding IV value based on the WOE value of the selected index, then binning data with SPSS Modeler. The selected variables were modeled using a logistic regression algorithm. In order to make the regression results more practical, we extract the scoring rules according to the results of logistic regression, convert them into the form of score cards, and finally verify the validity of the model.
 
</p></abstract><kwd-group><kwd>Credit System</kwd><kwd> Weight of Evidence</kwd><kwd> Information Value</kwd><kwd> K-S Test</kwd><kwd> Logistic Regression</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Credit investigation refers to the collection, sorting, preservation, and processing of credit information of natural persons, legal persons and other organizations in accordance with the law, and the provision of services such as credit reports, credit evaluations, and credit information consultations [<xref ref-type="bibr" rid="scirp.111009-ref1">1</xref>]. Credit investigation can be divided into personal credit investigation and enterprise credit investigation. Based on residents’ family income and assets, previous loans and repayments, credit overdrafts, penalties and litigation in the event of bad credit, personal credit investigation is to evaluate, record and archive personal credit ratings at any time, so as to facilitate the supplier of personal credit deciding whether to provide credit or how much to provide [<xref ref-type="bibr" rid="scirp.111009-ref2">2</xref>]. Personal credit evaluation is to identify the behavior of individual customers, screen out the evaluation variables that have a strong relationship with the behavior of individual customers, and use the few selected variables to establish the necessary credit evaluation model to make a prejudgment of individual credibility [<xref ref-type="bibr" rid="scirp.111009-ref3">3</xref>], rank customers and then distinguish between “good” and “bad” customers, which aims to offer a scientific and reasonable technical reference and decision-making basis for enterprises [<xref ref-type="bibr" rid="scirp.111009-ref4">4</xref>]. Credit risk assessment has become an urgent problem to be solved [<xref ref-type="bibr" rid="scirp.111009-ref5">5</xref>].</p><p>With the rapid development of information technology, people’s ability of statistical analysis and summary of data is increasing. A credit score card model based on historical data and using statistical methods to assess customer risk begins to emerge [<xref ref-type="bibr" rid="scirp.111009-ref6">6</xref>]. At present, the credit evaluation models in foreign markets mainly include FICO credit score [<xref ref-type="bibr" rid="scirp.111009-ref7">7</xref>], Zest Finance credit score [<xref ref-type="bibr" rid="scirp.111009-ref8">8</xref>], and NCTUE credit evaluation. Domestic mature personal credit rating products in China include Sesame Credit [<xref ref-type="bibr" rid="scirp.111009-ref9">9</xref>], Jingdong Baitiao [<xref ref-type="bibr" rid="scirp.111009-ref10">10</xref>], and Credit Score (China Mobile) [<xref ref-type="bibr" rid="scirp.111009-ref11">11</xref>], etc. Personal credit score is usually regarded as a classification problem in pattern recognition. The general way to study such problems is to divide customers into good customers and bad customers [<xref ref-type="bibr" rid="scirp.111009-ref12">12</xref>]. Good customers include customers without downtime and customers who pay after downtime, their characteristics are: network access time is relatively long; the user value of the contact circle is high; more active days; more traffic usage; the number of overdue fees is less. Bad customers include those who do not pay in time after the overdue shutdown, their characteristics are: The duration of network access time is short; the number of active days and traffic usage is small; the value of users in the contact circle is not high; the number of overdue fees is more. In recent years, SVM (support vector machine) has been rapidly developed and widely used in the field of personal credit evaluation. Tony and Harris made an empirical analysis of loans information from customers of a financial institution by utilizing the SVM method. It turned out that the SVM model is beneficial to the research of small sample data [<xref ref-type="bibr" rid="scirp.111009-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.111009-ref14">14</xref>]. In China, there are also many researches on personal credit evaluation. Combined with principal component analysis，Li Meng constructed a Logistic model for commercial bank credit risk assessment, which proved that the Logistic model has high recognition and predictive capabilities and suffices to function effectively in commercial bank credit risk assessment [<xref ref-type="bibr" rid="scirp.111009-ref15">15</xref>].</p><p>The Internet of Everything has accelerated the rapid growth of the mobile telecom industry, as well as brought unprecedented transformation challenges to traditional telecom operators [<xref ref-type="bibr" rid="scirp.111009-ref16">16</xref>]. It is advisable that Telecom operators need to change the traditional operation methods to provide customers with faster and more personalized ones. Given that the complexity and diversity of credit evaluation variables, and the accuracy of the logistic regression model, first, the weight of evidence-information value (WOE-IV) method will be employed to select the variables [<xref ref-type="bibr" rid="scirp.111009-ref17">17</xref>]. Second, the dimensionality reduction variables and the logistic regression method will be utilized to record data on the customer behavior of a communication operator in China, aiming to establish a statistical analysis model for personal credit evaluation to differentiate between “good” customers and “bad” customers. Then, in accordance with the judgmental results, it’s easier to provide customers with personalized marketing plans. For customers with poor credit records, increasing control can effectively reduce the risk of arrears and bad debts; for valued customers with good credit, some preferential packages and other services should be launched to attract more users to come back again, thereby enhancing the competitiveness of the enterprise [<xref ref-type="bibr" rid="scirp.111009-ref18">18</xref>].</p></sec><sec id="s2"><title>2. Data and Processing Methods</title><p>In simple terms, mobile telecommunication data is the data generated by the mobile phone users of the operators. According to the data source, it can be roughly divided into identity data, terminal data, location data, billing data, call list data, communication data and Internet data. The characteristics of mobile telecom data: wide coverage; high authenticity; large amount of data; strong timeliness; multiple data dimensions.</p><p>A total of 10,185 samples were obtained from the business data records of a Chinese communications operator for the six months from January to June. Based on the basic customer information provided by mobile telecommunications operator and referring to existing scoring models at home and abroad, we roughly divide the data into six dimensions: identity characteristics, behavior preference, performance ability, credit history, relationship, external data, including customer age, basic information of overdue payment, communication behavior, online behavior, circle of friends and other data. These data are mainly structured data<sup>.</sup> For the reason that the data recording range and measurement scale of different numerical variables are not the same, so it is necessary to normalize the numerical variables. The transformation formula is as follows:</p><p>x ′ i j = x i j − min x i j max x i j − min x i j (1)</p><p>where x i j represents the original variable value, x ′ i j represents the value obtained after normalization, min x i j represents the minimum value of all sample data in the i variable, max x i j represents the maximum value of all sample data in the first variable.</p><p>The data period is divided into observation period and performance period<sup>.</sup> The evaluation index constructed by using the basic situation and behavior characteristics of customers during the observation period is called the independent variable, and the performance of whether customers owe fees in the performance period is called dependent variable. In this paper, we select the basic customer information data provided for operators from January to September, and takes June as the observation point. The sample observation period is six months, from January to June, and the sample performance period is July to September, then the window length of the performance period is three months.</p><p>The 10185 samples contained 5810 “good” customers (marked as 0) and 4375 “bad” customers (marked as 1). For the needs of credit evaluation modeling, the 10,185 sample data is randomly divided into 8148 training set samples and 2037 test set samples at a ratio of about 4:1. We use one part of the data (80%) as the training set for the establishment of the model, and the other part of the data (20%) as the verification set for the verification of the model. For the training set, the last column of indicators (whether they are owed) is the dependent variable y, and the other indicators are independent variables. The autocorrelation analysis of these indicators shows that the correlation between the two indicators is relatively high and only one of them needs to be selected. The selected indicators are divided into bins, calculate the woe value of each file, and then calculate the corresponding IV value according to the calculated woe value.</p><p>The WOE value is the weight of evidence. The higher the value of woe means the higher the probability of arrear. For the category i of a variable, the WOE value is calculated as follows:</p><p>WOE i = ln ( G i / G B i / B ) (2)</p><p>where G represents good customers, B represents bad customers, G<sub>i</sub>/G represents the proportion of good customers in the category of variable i, and B<sub>i</sub>/B represents the proportion of bad customers in the category of variable i. Use the above formula (2), we redefine the WOE expression of the variable X as:</p><p>WOE ( X ) = β 1 WOE 1 + β 2 WOE 2 + ⋯ + β r WOE r (3)</p><p>where β 1 , β 2 , ⋯ , β r are binary dummy variables. That is, for all categories of variables i = 1 , 2 , ⋯ , r , if the value of X belongs to the i-th class, then β i = 1 , and when X does not belong to the i-th class, β i = 0 .</p><p>When we calculate the woe value of a variable, we need to grade the indicators according to the following points:</p><p>First, the number of groups should be moderate, not too much or too little;</p><p>Second, in order to ensure that there are enough good and bad customer samples in each group, the number of records in each subfile should be reasonable, not too much or too little.</p><p>Third, combining with dependent variables, the segmentation should be able to show obvious trend characteristics.</p><p>Fourth, the distribution difference of dependent variables between adjacent sub grades should be as large as possible.</p><p>IV is information value. According to the credit evaluation system model, it is generally assumed that when IV &lt; 0.1, the indicator has no effect. When 0.1 &lt; IV &lt; 0.3, the index has a certain effect. When IV &gt; 0.3, the index has a significant effect. The IV value of the variable is calculated by:</p><p>IV = ∑ i = 1 r WOE i ( G i G − B i B ) (4)</p><p>According to the magnitude of the IV value, the variables that have no effect are deleted, and the variables that have a certain effect are retained, so that the variables can be filtered out. We can merge groups with too few sample points or unreasonable hops with neighboring groups. Finally, SPSS Modeler was used to complete the classification. The final results are shown in <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>According to the data preprocessing, the six independent variables were selected: network access time, active days, number of overdue fees, contact circles, number of traffic used, age.</p></sec><sec id="s3"><title>3. Analysis and Discussion</title><p>The flowchart of the construction of the credit score card model is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><p>1) Logistic regression</p><p>The event of arrears is represented by the variable y，when y = 1, it is bad customer, and when y = 0, it is good customer. Our purpose is to use the existing sample data to build a model to predict the probability p of the rate of arrears. No matter whether we predict a new customer to be a good customer or a bad customer, the result of using logistic regression analysis is not simply to give yes or no, but to give a probability of this event.</p><p>2) Conversion of scorecard</p><p>In order to make the results of logistic regression more practical, we need to convert the results into the form of scores. So we use SPSS modeler to transform the result of logistic regression into the form of score card, as shown in <xref ref-type="table" rid="table2">Table 2</xref>.</p><p>The score should meet the following requirements:</p><p>First, control the score within a certain range, and draw up a range according to your own business needs, such as 0 to 1000 points.</p><p>Second, at a certain score, good customers and bad customers have a certain proportional relationship. There is a special statistic in statistics-odds to represent this proportional relationship. For example, when we expect a score of 500, the ratio of good and bad customers is 50:1.</p><p>Third, the increase in score value should reflect the change in the ratio between good and bad customers. For example, it is hoped that for every 50 points increase in score value, the odds will also double.</p><p>The value relationship of credit score is:</p><p>score = ln ( o d d s ) ∗ factor + offset (5)</p><p>Based on the company’s own business, we independently set the value of the ratio of good to bad customers, that is, the odds ratio, and the increase of the</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> WOE value and IV value of the selected variables</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Index name</th><th align="center" valign="middle" >Index grading</th><th align="center" valign="middle" >WOE</th><th align="center" valign="middle" >IV</th></tr></thead><tr><td align="center" valign="middle"  rowspan="8"  >network access time</td><td align="center" valign="middle" >[6, 7]</td><td align="center" valign="middle" >1.06</td><td align="center" valign="middle"  rowspan="8"  >0.66</td></tr><tr><td align="center" valign="middle" >(7, 11]</td><td align="center" valign="middle" >0.44</td></tr><tr><td align="center" valign="middle" >(11, 24]</td><td align="center" valign="middle" >−0.24</td></tr><tr><td align="center" valign="middle" >(24, 36]</td><td align="center" valign="middle" >−0.66</td></tr><tr><td align="center" valign="middle" >(36, 48]</td><td align="center" valign="middle" >−0.81</td></tr><tr><td align="center" valign="middle" >(48, 72]</td><td align="center" valign="middle" >−1.03</td></tr><tr><td align="center" valign="middle" >(72, 150]</td><td align="center" valign="middle" >−1.32</td></tr><tr><td align="center" valign="middle" >&gt;150</td><td align="center" valign="middle" >−1.81</td></tr><tr><td align="center" valign="middle"  rowspan="6"  >active days</td><td align="center" valign="middle" >(0, 2]</td><td align="center" valign="middle" >1.44</td><td align="center" valign="middle"  rowspan="6"  >0.56</td></tr><tr><td align="center" valign="middle" >(2, 10]</td><td align="center" valign="middle" >0.55</td></tr><tr><td align="center" valign="middle" >(10, 20]</td><td align="center" valign="middle" >−0.02</td></tr><tr><td align="center" valign="middle" >(20, 25]</td><td align="center" valign="middle" >−0.39</td></tr><tr><td align="center" valign="middle" >(25, 29]</td><td align="center" valign="middle" >−0.73</td></tr><tr><td align="center" valign="middle" >&gt;29</td><td align="center" valign="middle" >−0.83</td></tr><tr><td align="center" valign="middle"  rowspan="7"  >number of overdue fees</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >−0.56</td><td align="center" valign="middle"  rowspan="7"  >0.76</td></tr><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.93</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >1.58</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >1.69</td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" >2.02</td></tr><tr><td align="center" valign="middle" >5</td><td align="center" valign="middle" >1.95</td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" >1.90</td></tr><tr><td align="center" valign="middle"  rowspan="6"  >contact circle</td><td align="center" valign="middle" >(0, 1]</td><td align="center" valign="middle" >0.86</td><td align="center" valign="middle"  rowspan="6"  >0.34</td></tr><tr><td align="center" valign="middle" >(1, 4]</td><td align="center" valign="middle" >−0.12</td></tr><tr><td align="center" valign="middle" >(4, 8]</td><td align="center" valign="middle" >−0.38</td></tr><tr><td align="center" valign="middle" >(8, 15]</td><td align="center" valign="middle" >−0.53</td></tr><tr><td align="center" valign="middle" >(15, 30]</td><td align="center" valign="middle" >−0.69</td></tr><tr><td align="center" valign="middle" >≥31</td><td align="center" valign="middle" >−0.73</td></tr><tr><td align="center" valign="middle"  rowspan="6"  >number of traffic used</td><td align="center" valign="middle" >(0, 30]</td><td align="center" valign="middle" >0.79</td><td align="center" valign="middle"  rowspan="6"  >0.26</td></tr><tr><td align="center" valign="middle" >(30, 200]</td><td align="center" valign="middle" >0.03</td></tr><tr><td align="center" valign="middle" >(200, 500]</td><td align="center" valign="middle" >−0.29</td></tr><tr><td align="center" valign="middle" >(500, 1000]</td><td align="center" valign="middle" >−0.45</td></tr><tr><td align="center" valign="middle" >(1000, 2000]</td><td align="center" valign="middle" >−0.56</td></tr><tr><td align="center" valign="middle" >&gt;2000</td><td align="center" valign="middle" >−0.27</td></tr><tr><td align="center" valign="middle"  rowspan="4"  >age</td><td align="center" valign="middle" >≤18</td><td align="center" valign="middle" >0.80</td><td align="center" valign="middle"  rowspan="4"  >0.12</td></tr><tr><td align="center" valign="middle" >(18, 25]</td><td align="center" valign="middle" >0.14</td></tr><tr><td align="center" valign="middle" >(25, 50]</td><td align="center" valign="middle" >−0.13</td></tr><tr><td align="center" valign="middle" >&gt;50</td><td align="center" valign="middle" >0.29</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> score card</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="3"  >Identity characteristics</th><th align="center" valign="middle"  colspan="17"  >Age</th></tr></thead><tr><td align="center" valign="middle"  colspan="2"  >(0, 18]</td><td align="center" valign="middle"  colspan="5"  >(18, 25]</td><td align="center" valign="middle"  colspan="6"  >(25, 50]</td><td align="center" valign="middle"  colspan="4"  >&gt;50</td></tr><tr><td align="center" valign="middle"  colspan="2"  >0</td><td align="center" valign="middle"  colspan="5"  >20</td><td align="center" valign="middle"  colspan="6"  >50</td><td align="center" valign="middle"  colspan="4"  >30</td></tr><tr><td align="center" valign="middle"  rowspan="6"  >Behavioral preferences</td><td align="center" valign="middle"  colspan="17"  >Active days (days)</td></tr><tr><td align="center" valign="middle" >≤2</td><td align="center" valign="middle"  colspan="2"  >(2, 10]</td><td align="center" valign="middle"  colspan="4"  >(10, 20]</td><td align="center" valign="middle"  colspan="2"  >(20, 25]</td><td align="center" valign="middle"  colspan="5"  >(25, 29]</td><td align="center" valign="middle"  colspan="3"  >&gt;29</td></tr><tr><td align="center" valign="middle" >0</td><td align="center" valign="middle"  colspan="2"  >23</td><td align="center" valign="middle"  colspan="4"  >46</td><td align="center" valign="middle"  colspan="2"  >66</td><td align="center" valign="middle"  colspan="5"  >100</td><td align="center" valign="middle"  colspan="3"  >155</td></tr><tr><td align="center" valign="middle"  colspan="17"  >Number of traffic used (M)</td></tr><tr><td align="center" valign="middle" >[0, 30]</td><td align="center" valign="middle"  colspan="2"  >[30,200]</td><td align="center" valign="middle"  colspan="4"  >[200,500]</td><td align="center" valign="middle"  colspan="3"  >(500,100 0]</td><td align="center" valign="middle"  colspan="4"  >(1,000,200 0]</td><td align="center" valign="middle"  colspan="3"  >&gt;2000</td></tr><tr><td align="center" valign="middle" >0</td><td align="center" valign="middle"  colspan="2"  >20</td><td align="center" valign="middle"  colspan="4"  >40</td><td align="center" valign="middle"  colspan="3"  >80</td><td align="center" valign="middle"  colspan="4"  >100</td><td align="center" valign="middle"  colspan="3"  >60</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Performance capability</td><td align="center" valign="middle"  colspan="17"  >Number of overdue fees in recent 6 months (times)</td></tr><tr><td align="center" valign="middle" >0</td><td align="center" valign="middle"  colspan="4"  >1</td><td align="center" valign="middle"  colspan="7"  >[2, 3]</td><td align="center" valign="middle"  colspan="5"  >[4, 6]</td></tr><tr><td align="center" valign="middle" >320</td><td align="center" valign="middle"  colspan="4"  >100</td><td align="center" valign="middle"  colspan="7"  >50</td><td align="center" valign="middle"  colspan="5"  >0</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Credit history</td><td align="center" valign="middle"  colspan="17"  >Network access time (month)</td></tr><tr><td align="center" valign="middle" >[6, 7]</td><td align="center" valign="middle"  colspan="2"  >(7, 11]</td><td align="center" valign="middle"  colspan="3"  >(11, 24]</td><td align="center" valign="middle"  colspan="2"  >[<xref ref-type="bibr" rid="scirp.111009-ref24">24</xref>] 13</td><td align="center" valign="middle"  colspan="5"  >[48,100]</td><td align="center" valign="middle"  colspan="3"  >[100,180]</td><td align="center" valign="middle" >&gt;180</td></tr><tr><td align="center" valign="middle" >0</td><td align="center" valign="middle"  colspan="2"  >20</td><td align="center" valign="middle"  colspan="3"  >50</td><td align="center" valign="middle"  colspan="2"  >80</td><td align="center" valign="middle"  colspan="5"  >120</td><td align="center" valign="middle"  colspan="3"  >150</td><td align="center" valign="middle" >245</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Connections</td><td align="center" valign="middle"  colspan="17"  >Contact circles (ones)</td></tr><tr><td align="center" valign="middle" >≤1</td><td align="center" valign="middle"  colspan="3"  >(1, 4]</td><td align="center" valign="middle"  colspan="3"  >(4, 8]</td><td align="center" valign="middle"  colspan="4"  >(8, 15]</td><td align="center" valign="middle"  colspan="4"  >(15, 30]</td><td align="center" valign="middle"  colspan="2"  >≥31</td></tr><tr><td align="center" valign="middle" >0</td><td align="center" valign="middle"  colspan="3"  >30</td><td align="center" valign="middle"  colspan="3"  >40</td><td align="center" valign="middle"  colspan="4"  >50</td><td align="center" valign="middle"  colspan="4"  >80</td><td align="center" valign="middle"  colspan="2"  >128</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><p>score value when the odds doubles. In this paper, the proposed value is debugged several times in combination with the operator’s own business. Finally, it is determined that when the value of good customers is 30:1 compared with bad customers, the corresponding score is 500 points, and when the score value is increased by 50 points, the odds are doubled. Therefore, according to the scoring formula, we can get:</p><p>500 = ln ( 30 ) ⋅ factor + offset (6)</p><p>550 = ln ( 60 ) ⋅ factor + offset (7)</p><p>Using the above formula, we can get the value of factor and offset. The formula for calculating the score value of each file is:</p><p>score = ( WOE ∗ β + α n ) ∗ factor + offset n (8)</p><p>where α and β respectively represent the intercept value and coefficient value of the logistic regression results, n is the number of input variables. WOE, α , β change with different grades of the calculated variables</p><p>3) Model verification</p><p>K-S (kolmogorov-smironov) test index is a common test index of the current industry scoring model. It mainly verifies the ability of the model to distinguish good customers from bad customers by calculating the maximum difference of the cumulative percentage of the two types of customers, whose detailed calculation process is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p>Obviously, the value of KS is between [0,1]. In theory, we calculate the level of KS, which represents the effectiveness of the model. In practical application, the KS value of the model up to 0.2 is acceptable, while the value up to 0.4 indicates that the model has good distinguishing ability, while the value above 0.5 indicates that the model has strong distinguishing ability. The K-S value of this model is shown in <xref ref-type="table" rid="table3">Table 3</xref>, and the corresponding diagram is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><p>It can be obtained that the most obvious difference between good and bad customers is in the [300 - 400] range. The KS value of the model is 60.27%, which shows that the model works well.</p><p>According to the probability value (P value) predicted by the model, the “good” customer and the “bad” customer are estimated. When P &gt; 0.5, they are classified as “bad” customers (Y = 1). When P ≤ 0.5, they are classified as “good” customers (Y = 0). The confusion matrix between the actual value of the original sample data and the predicted value of the model is shown in <xref ref-type="table" rid="table4">Table 4</xref>.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> the results of K-S value</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Score grading</th><th align="center" valign="middle" >Ks-good</th><th align="center" valign="middle" >Ks-bad</th><th align="center" valign="middle" >Ks value</th></tr></thead><tr><td align="center" valign="middle" >0 - 100</td><td align="center" valign="middle" >0.29%</td><td align="center" valign="middle" >16.52%</td><td align="center" valign="middle" >16.23%</td></tr><tr><td align="center" valign="middle" >100 - 200</td><td align="center" valign="middle" >2.21%</td><td align="center" valign="middle" >45.86%</td><td align="center" valign="middle" >43.65%</td></tr><tr><td align="center" valign="middle" >200 - 300</td><td align="center" valign="middle" >5.31%</td><td align="center" valign="middle" >56.97%</td><td align="center" valign="middle" >51.66%</td></tr><tr><td align="center" valign="middle" >300 - 400</td><td align="center" valign="middle" >13.78%</td><td align="center" valign="middle" >74.04%</td><td align="center" valign="middle" >60.27%</td></tr><tr><td align="center" valign="middle" >400 - 500</td><td align="center" valign="middle" >34.71%</td><td align="center" valign="middle" >86.11%</td><td align="center" valign="middle" >51.41%</td></tr><tr><td align="center" valign="middle" >500 - 600</td><td align="center" valign="middle" >64.10%</td><td align="center" valign="middle" >93.19%</td><td align="center" valign="middle" >29.08%</td></tr><tr><td align="center" valign="middle" >600 - 700</td><td align="center" valign="middle" >86.74%</td><td align="center" valign="middle" >97.38%</td><td align="center" valign="middle" >10.64%</td></tr><tr><td align="center" valign="middle" >700 - 800</td><td align="center" valign="middle" >97.73%</td><td align="center" valign="middle" >99.50%</td><td align="center" valign="middle" >1.77%</td></tr><tr><td align="center" valign="middle" >800 - 900</td><td align="center" valign="middle" >99.93%</td><td align="center" valign="middle" >99.99%</td><td align="center" valign="middle" >0.05%</td></tr><tr><td align="center" valign="middle" >900 - 1000</td><td align="center" valign="middle" >100.00%</td><td align="center" valign="middle" >100.00%</td><td align="center" valign="middle" >0.00%</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Classification results of logistic regression model based on WOE-IV selecting characteristic variable</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  colspan="2"   rowspan="2"  >Training sets</th><th align="center" valign="middle"  colspan="2"  >Prediction</th><th align="center" valign="middle"  rowspan="2"  >Classification accuracy (%)</th><th align="center" valign="middle"  colspan="2"   rowspan="2"  >Testing sets</th><th align="center" valign="middle"  colspan="2"  >Prediction</th><th align="center" valign="middle"  rowspan="2"  >Classification accuracy (%)</th></tr></thead><tr><td align="center" valign="middle" >0</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >1</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >Real date</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >4290</td><td align="center" valign="middle" >311</td><td align="center" valign="middle" >93.24</td><td align="center" valign="middle"  rowspan="2"  >Real date</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >1031</td><td align="center" valign="middle" >161</td><td align="center" valign="middle" >86.49</td></tr><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >794</td><td align="center" valign="middle" >2753</td><td align="center" valign="middle" >77.61</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >146</td><td align="center" valign="middle" >699</td><td align="center" valign="middle" >82.72</td></tr><tr><td align="center" valign="middle"  colspan="4"  >Total classification accuracy (%)</td><td align="center" valign="middle" >86.44</td><td align="center" valign="middle"  colspan="4"  >Total classification accuracy (%)</td><td align="center" valign="middle" >84.93</td></tr></tbody></table></table-wrap></sec><sec id="s4"><title>4. Conclusions</title><p>We perform correlation analysis on each indicator of the data in the training set, and calculate the corresponding IV value based on the WOE value of the selected index, then binning data with SPSS Modeler. The selected variables were modeled using logistic regression algorithm. From the results of model analysis, logistic regression models have the following advantages: 1) Better stability and stronger robustness. 2) The model is intuitive. The meaning of coefficient is easy to explain and understand. 3) When the effect of the model we built has declined, the logical model can better diagnose the cause of disease.</p><p>Through the evaluation of personal credit, the user group can be differentiated according to user credit level to adopt the corresponding marketing operation plan for different groups to achieve precise marketing. By identifying and strengthening the control of poorly valued customers, the risk of arrears and bad debts can be effectively reduced. For high-quality customers with good credit, we can push some preferential packages and other services, so as to improve the stickiness of these users. There are many methods to establish credit evaluation model, each method has its own advantages and disadvantages. In this paper, the linear method is used to establish the evaluation model, which has good robustness and model interpretation ability, but the linear method cannot extract the nonlinear relationship in the data, which is not conducive to the processing of large-scale sample data. How to organically combine machine learning methods with traditional logistic regression methods will be the focus of the later research in this article.</p></sec><sec id="s5"><title>Acknowledgements</title><p>This work was supported by Guangdong University Youth Innovation Talent Project under Grant NO. 2019KQNCX213, the Scientific Project of Guangzhou Huashang College under Grant NO. 2019HSDS25.</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Hong, S.Y., Zhang, Y. and Yang, C. (2021) Research on Personal Credit Evaluation Based on Mobile Telecommunications Data. Journal of Data Analysis and Information Processing, 9, 151-161. https://doi.org/10.4236/jdaip.2021.93010</p></sec></body><back><ref-list><title>References</title><ref id="scirp.111009-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Liu, X.G. and Wang, T.Y. (2016) International Development of Credit Information and Its Application in China’s Insurance Industry. Financial Computerization, No. 10, 48-50.</mixed-citation></ref><ref id="scirp.111009-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Wu, K. (2010) Construction of China’s Personal Credit System. Southwestern University of Finance and Economics, Chengdu.</mixed-citation></ref><ref id="scirp.111009-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, C. and Wan, X. (2019) Construction of Personal Credit Evaluation System and Evaluation Model under the Background of Big Data. Credit Information, No. 10, 66-71.</mixed-citation></ref><ref id="scirp.111009-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Galindo, J. and Tamayo, P. (2000) Credit Risk Assessment Using Statistical and Machine Learning: Basic Methodology and Risk Modeling Applications. Computational Economics, 15, 107-143. https://doi.org/10.1023/A:1008699112516</mixed-citation></ref><ref id="scirp.111009-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Zhou, Y.S., Cui, J.L., Zhou, L.Y., et al. (2020) Research on Personal Credit Risk Assessment Based on Improved Stochastic Forest Model. Credit Reference, No. 1, 28-32.</mixed-citation></ref><ref id="scirp.111009-ref6"><label>6</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names> Y.H. </given-names></name>,<etal>et al</etal>. (<year>2010</year>)<article-title>The Establishment of Credit Score Card Model</article-title><source> Science and Technology Information</source><volume> 37</volume>,<fpage> 48</fpage>-<lpage>49</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111009-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Berger, A.N., Frame, W.S. and Miller, N.H. (2005) Credit Scoring and Availability, Price, and Risk of Small Business Credit. Journal of Money Credit &amp; Banking, 37, 191-222. https://doi.org/10.1353/mcb.2005.0019</mixed-citation></ref><ref id="scirp.111009-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Liu, X.H. and Ding, W. (2015) Big Data Credit Investigation Practice of American Zest Finance Company. Credit Investigation, No. 8, 27-32.</mixed-citation></ref><ref id="scirp.111009-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Sun, J.Y., Zhang, M.M. and Wan, S.Y. (2017) SWOT Analysis on the Development of China’s Credit Economy—Taking the Alipay Platform Ant Flower Bai as an Example. China Business Review, No. 8, 148-149.</mixed-citation></ref><ref id="scirp.111009-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Kit (2015) Big Data Credit Reference Is the Cornerstone of JD White Slip. Science and Technology Daily, 2015-07-08, 011.</mixed-citation></ref><ref id="scirp.111009-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Wang, B., Chen, B., Wei, Y.H., et al. (2016) Research on Construction Method and Business Model of Credit Evaluation System Based on Telecom Big Data. Mobile Communications, 40, 75-79.</mixed-citation></ref><ref id="scirp.111009-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Chen, J. and Yang, T.N. (2005) UML Modeling Method for Personal Credit Evaluation System for College Students. Journal of Chongqing University, No. 11, 62-64.</mixed-citation></ref><ref id="scirp.111009-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Tony, D. and Gestel, V. (2003) A Support Vector Machine Approach to Credit Scoring. Bank Financiewezen, 12, 73-82.</mixed-citation></ref><ref id="scirp.111009-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Harris, T. (2015) Credit Scoring Using the Clustered Support Vector Machine. Expert Systems with Applications, 42, 741-750. https://doi.org/10.1016/j.eswa.2014.08.029</mixed-citation></ref><ref id="scirp.111009-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Li, M. (2005) Application of Logit Model in Credit Risk Assessment of Commercial Banks. Management Science, No. 2, 33-38.</mixed-citation></ref><ref id="scirp.111009-ref16"><label>16</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wu</surname><given-names> H.Q. </given-names></name>,<etal>et al</etal>. (<year>2020</year>)<article-title>New Features of Network Society in 5g Era and Challenges Facing Industry</article-title><source> Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition)</source><volume> 32</volume>,<fpage> 171</fpage>-<lpage>176</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111009-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Chen, Z.Y. (2020) Perfect Combination: Research on the Credit Scoring Card Model of Online Lending Based on Machine Learning. Wuhan Finance, No. 3, 42-50.</mixed-citation></ref><ref id="scirp.111009-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Chen, Q.H., Yang, H.R. and Cui, H.J. (2020) Personal Credit Scoring Model and Statistical Learning after Variable Screening. Mathematical Statistics and Management, 39, 368-380.</mixed-citation></ref></ref-list></back></article>