Detection of Fraud Patterns in Accounting Accounts Using Data Mining Techniques ()
1. Introduction
Accounting data analysis has become a necessary process for fraud detecting. It is a currency problem that has increased the interest in accounting researching (Seo, Choi, Choi, Lee, & Lee, 2009). Fraud Detection and control have raised relevance in the audit process (Westhausen, 2017) where the informatics program is taking more presence as a tool capable to perceive the deception in accounting data (Gottschalk & Solli-Saether, 2010).
Data mining is one of the most important current models of modern intelligent business analysis and decision support tools. This relevance is accepted by the transcendental professional accounting organizations. The American Institute of Certified Public Accountants (AICPA) has identified data mining as one of the top ten technologies going forward, and the Institute of Internal Auditors (IIA) has listed data mining as one of the top four research priorities (Koh & Low, 2004). Additionally, Chartered Global Management Accountants (CGMA) has reported that N 50% of corporate leaders rank big data and data mining among the top ten corporate priorities that are central to the data-driven business era (CGMA, 2013). Data mining has been defined as the process of identifying valid, potentially novel, and ultimately understandable patterns in the data (Pujari, 2001). It is also known as the process of extracting knowledge from massive amounts of data (Han et al., 2006) to increase the efficiency of decisions in a particular discipline. The key approach to data mining, therefore, is to leverage an organization's data assets for financial or non-financial benefits. Therefore, data mining has been applied to almost all commercial and non-commercial disciplines, including accounting.
Data mining has given the possibility to analyze big data where some elements have been considered in untypical way or behavior in its input or manage as a support of traditional making decisions (Mraović, 2008). It has turned in strong tool to analyze the accounting data (Debreceny & Gray, 2010; Joudaki et al., 2015; Qin, 2014; Wu, Ou, Lin, Chang, & Yen, 2012). Business knowledge, data interpretation and logical reason to apply analytic method to build the data are embraced with data mining analysis (Jackson, 2002). A lot of algorithms for different analyses are possible in data mining (Wu et al., 2008). A broad scope of application in accounting science was resumed by Amani & Fadlalla (2017) in the following statements:
1) Data classification and mapping to predefine qualitative attributes in itself
2) Data clustering in clusters with specifical signification
3) Focus in predictive approach to find future numerical values or a classification of no numerical values
4) Outliers detection, indeed values that are out of range
5) Optimization, the best solution choice for a means set
6) Visualization, adequate image for the best data comprehension
7) Regression, estimation of a dependent variable from a set of independent variables.
Outliers’ detection was done for two different algorithms:
1) Outliers detection considered the distance it locates higher atypical values in a data set no labeled to get subset of it. This subset is denominated outliers solution set (Angiulli, Basta, & Pizzuti, 2006).
2) Detecting outlier in a data set considered a Local Outlier Factor (LOF). El LOF is focused on local density where the location is expressed for more close K neighbors whose distance is used to estimate the density. When it compares the local density from object with the local density of their neighbors it possible to identify areas with similar density and point with minor density that its neighbors. It is outliers (Breunig, Kriegel, Ng, & Sander, 2000). First data classification used a common clustering suggested by Abonyi & Feil (2007) and K middle algorithmic considered of frequency utilization for this purpose, according to Likas, Vlassis, & Verbeek (2003). Detecting of fraud pattern in accounting data with outlier data mining algorithmic was the objective.
2. Methods
Data base was made with real assent of a month from enterprise accounting bank. Inside of it was settling transactions from three-fraud pattern.
1) A little Withdraws (credit) of 0.05 cents many times.
2) Withdraws (credit), a constant amount with regular period of time.
3) A high quantitative withdraw (credit), more than average account entries.
RapidMiner program was used for Data Analysis according to the argument from (Amer & Goldstein, 2012; Hofmann & Klinkenberg, 2013; Jungermann, 2009). In the process, RapidMiner program reads first the database from Excel and selects (filter) the real values to avoid null values in attributes analyzed in the algorithmics as first step. Continued was applied the algorithmic process, Euclidian distance and local outlier factors as is showed in Figure 1 and Figure 2.
The outliers were determined by operator using LOF algorithmic and local density in it (Breunig et al., 2000).
Figure 1. Clustering by distance function: Euclidian distance.
Figure 2. Local outlier factors (LOF) detection.
3. Result and Discussion
3.1. Bibliometrics Analysis of Detection of Fraud Pattern in Accounting Accounts Using Data Mining Techniques
Using as search criteria “detection of fraud and data mining” in the title, abstract, and the keywords of the scientific contributions in the Scopus academic directory. 921 results were detected between Conference Paper, Article and Review. Figure 3 shows the quantitative evolution that this line of research has undergone in recent years. The trend of the curve is increasing, in this figure. It is evident that the international scientific community considers fraud detection techniques through data mining as an effective and practical method. This figure was made using the bibliometric analysis tools provided by Scupus.
The bibliographic information of each of the 921 detected documents was imported from the Scopus platform, in (.ris) format. This process allowed this data to be processed in the VOSviewer science bibliometric analysis software. This computational tool allows building and visualizing bibliometric networks of scientific activity in journals, researchers or publications. As well as generating maps of scientific activity based on citations, bibliographic coupling, co-citations or authorship relationships, among others. The thermos research limitations density map shows the concentration of the most recent terms in a given research area. Figure 4 shows a term density map that shows which are the most frequent terms and their relationship in research in this area of knowledge. This type of figures allows a very clear idea of current research trends and trends in the subject in question.
3.2. Detection of Fraud Pattern in Accounting Accounts Using Data Mining Techniques
· Clustering
Figure 3. Number of scientific contributions per year in the subject area of fraud detection and data mining.
Figure 4. Density map of thermoses, made on the key words of the detected contributions.
Cluster groups are observable in Figure 5. They are maxims values. In red color is perceived an outlier value that means a fraud pattern. Three outliers grouped in dark blue color the account entries attached at 0.05 as value. The high level of operation was inferred by dark color around the point as evidence of fraud pattern. The calculated centroids for this clustering distribution are presented in Table 1.
· Detection by amount
Figure 6 was made with the same data that was used to obtain Figure 5. A lot of transaction (high frequency) between 0 - 5000.00 was showed. The difference with Figure 5 is that values around 0.05 aren’t separate in a group. Indeed, it does not detect the one pattern fraud. Analogous condition occurred with fraud Pattern three, where the outlier values aren’t insolated because they are inside of a range of high values transaction. Withdraw frequency distribution doesn’t let looking the fraud patterns. This is only possible with data maning tecniques.
True cluster separated in a group high withdraw including a value from the pattern three. These values are considered as suspicious, out of range. False cluster grouped in dark blue color a lot of small transaction around cero, leading toward one pattern fraud (Figure 7).
Blue dark color is representative of a cluster that group small values around cero. It is an indication of fraud pattern one. The values designed with color that trend to dark red are relevant because they are high withdrawn. It must be considered in the representative sample in account audit universe. It, also, included a value representative of the pattern fraud three (Figure 8).
These figures exposed the most precise information. In dark blue color are a lot of small account entries per day around cero, evidence of the fraud pattern 1. In spectrum of red color appear per day transactions with high values, suspicions of
Figure 6. Withdraw frequency distribution.
Figure 7. Detection using LOF algorithmic.
Figure 8. Outlier value distribution using LOF.
irregular entries settle in accounts. The matter is that available information per day gives a temporality sequence in the way that the fraud happened to building architecture used in its committed. It is visible in Figure 9 clusters of transactions in dark color blue concentrate in specifics days. This indicated a suspicious regular performance in the accounting operations associated with time. It’s a way that detected the second pattern as a paradox: an irregular proceeding committed as shady regular accounting record correlated with time period.
Research limitations:
· Bibliographic references you will consider are from the Scopus database.
· Only manuscripts published in the English language are considered.
· Database was made only with real assent of a month from enterprise accounting bank.
4. Conclusion
The application of Data Mining using the detection of Outliers (both Distance Algorithm and Local Outliers Factor) made it possible to detect (fraudulent) patterns injected into an accounting database that simulated three possible types of fraud:
Table 1. Centroids calculation for withdrawing.
1) A little Withdraws (credit) of 0.05 cents many times.
2) Withdraws (credit), a constant amount with a regular period of time.
3) A high quantitative withdraw (credit), more than average account entries.
The detection of Outliers offered the following possibilities:
The clusters grouping indicated, in a simple way, the possible movements’ distribution for auditors. The use of the k Means algorithm (k = 4) allowed an adequate visualization of the values’ distribution. It has detected the pattern third in cluster one, and first pattern in cluster three (Figure 5. Clustering Withdraw). The application of the outlier's detection algorithm (LOF) detected, Figure 6 and Figure 7, the first and third pattern in a clear way as a consequence of the insolate outliers (true and false) or transactions density (high and low) Daily Outliers distribution, Figure 5, detected all the patterns. It’s the only that remarked the second pattern as an expression of a paradoxid result: a regular transaction correlated with time variable is an irregular procedure in accounting data. It’s the core idea used to detect the second pattern.