Going Concern Prediction of Iranian Companies by Using Fuzzy C-Means

Decision-making problems in the area of financial status evaluation have been considered very important. Making incorrect decisions in firms is very likely to cause financial crises and distress. Predicting going concern of factories and manufacturing companies is the desire of managers, investors, auditors, financial analysts, governmental officials, employees. This research introduces a new approach for modeling of company’s behavior based on Fuzzy Clustering Means (FCM). Fuzzy clustering is one of well-known unsupervised clustering techniques, which allows one piece of data belongs to two or more clusters. The data used in this research was obtained from Iran Stock Market and Accounting Research Database. According to the data between 2000 and 2009, 70 pairs of companies listed in Tehran Stock Exchange are selected as initial data set. Our experimental results showed that FCM approach obtains good prediction accuracy in developing a financial distress prediction model. Also, in effective features determination test the results show that features based on cash flows play more important role in clustering two classes.


Introduction
The empirical literature of going concern prediction has recently gained further momentum and attention from financial institutions.Academicians and practitioners have realized that the problem of asymmetric information between banks and firms lies at the heart of important market failures such as credit rationing and that improvement in monitoring techniques represents a valuable alternative to any incomplete contractual arrangement aimed at reducing the borrowers' moral hazard [1].Traditionally, there are two major research trends in financial distress prediction.One is investigating the situation of failure to find the symptoms [2][3][4].The other is comparing the prediction accuracy of the diverse classification methods [5][6].This study belongs to the second type of research.Among financial distress forecasting methods, discriminant analysis was the dominant method for predicting corporate failure from 1966 until the early part of the 1980s [7][8][9].It gained wide popularity due to its ease of use and interpretation.However, both linear and quadratic discriminant analyses are sensitive to deviation from multivariate normality [10].During the 1980s, the method was replaced by the probit [11] and logit methods (logistic regression model) [12], especially, the logit model.These two models does give a crisp relationship between explanatory and response variables of the given data from a statistical viewpoint and do not assume multivariate normality, but the probit model assumed that the cumulative probability distribution must be standardized normal distribution, while the logit model assumed that the cumulative probability distribution must be logistic distribution.Since the 1990s, neural networks have been the most widely used techniques in developing quantitative bankruptcy prediction [13], in particular, the approximation or classification powers of the MLP trained by the backpropagation algorithm [14].Many studies compared the neural networks backpropagation algorithm with the statistical methods and found neural networks backpropagation outperforms the other statistic methods, such as Multivariate Discriminant Analysis (MDA) [15].Neural networks have recently been employed to extract rules for solving fuzzy classification problems [16].In particular, the Radial Basis Function Network (i.e., RBFN), have been widely used in a large number of fields, such as classification problems [17], function approximations [18] and management sciences.
Actually, the approximation or classification powers of the MLP trained by the backpropagation algorithm and RBFN are determined by the number of hidden nodes.In fact, the performance of backpropagation MLP is further influenced by the number of hidden layers.Additionally, an RBFN is functionally equivalent to a zero-order Sugeno fuzzy inference system under some conditions.In addition, it was proven that the zero-order Sugeno fuzzy inference system could approximate any nonlinear function on a compact set to an arbitrary degree of accuracy under certain conditions.However, if a phenomenon under consideration does not have stochastic variability but is also uncertain in some sense, it is more natural to seek a fuzzy functional relationship for the given data, which may be either fuzzy or crisp.It is proposed the quadratic interval logit model combining logit and quadratic interval regression models.The result demonstrates that the quadratic interval logit model is superior to the logit model.The logit model, the quadratic interval logit model, the back propagation MLP and the RBFN model all have their own advantages and limitations [19].

Literature Review
Sun and Li [20] used weighted majority voting combination of multiple classifiers for FDP, and introduced an integration strategy with subject weight based on neural network for bankruptcy prediction.They all generated diverse classifiers by applying different learning algorithms (with heterogeneous model representations) to a single data set, and concluded that to some degree FDP based on ombination of multiple classifiers was superior to single classifiers according to accuracy rate or stability.The most used machine learning technique is the neural network model, trained by the back-propagation learning algorithm (, whose prediction accuracy outperforms statistical models including Logistic Regression (LR), Linear Discriminant Analysis (LDA), Multiple Discriminant Analysis (MDA) and other machine learning models, such as k-Nearest Neighbor (k-NN) and decision trees.In addition, the Back-Propagation Neural Network (BPN) model can be used as the benchmark for financial decision support models.Chen and Du [21] found that prediction performance for the clustering approach is more aggressively influenced than the BPN model and the BPN approach obtains better prediction accuracy than the Data Mining (DM) clustering approach in developing a financial distress prediction model by applying different learning algorithms (with heterogeneous model representations) to a single data set, and concluded that to some degree FDP based on combination of multiple classifiers was superior to single classifiers according to accuracy rate or stability.Tsai and Wu [22] ensemble mul-tiple classifiers which were diversified by using neural networks on different data sets for bankruptcy prediction, and their experimental results showed that multiple neural network classifiers did not outperform a single best neural network classifier, based on which they considered that the proposed multiple classifiers system may be not suitable for the binary classification problem as bankruptcy prediction.
The purpose of this paper is to apply Fuzzy Clustering Means in going concern prediction model.Fuzzy C-Means (FCM) clustering is one of well-known unsupervised clustering techniques, which allows one piece of data belongs to two or more clusters.
The paper is organized as follows.In the next section we review the Fuzzy C-Means (FCM).The proposed method is explained in Section 3 with some experiments.In Section 4 we present our findings.Final section includes the conclusion.

Technical Background Fuzzy C-Means
FCM theory is the perfect one among many fuzzy clustering analysis methods that are effective for pattern recognition; details can be seen in reference.Considering a sample set R s , which is required to be divided into C categories; the aim of FCM is to obtain each category's clustering centre v c by minimizing the weighed square sum of inner-cluster error.


Therefore, its objective function is as follows With constraints where m is the smoothing parameter, which makes it effective from hard c-means to FCM.This parameter controls the sharing degree among each fuzzy categories, bigger m will result in more fuzzy division, or results in more definitive division.Its experimental range is 1.1 -5; μ cn is subjection of x n to the cth category; d cn represents the distance between x n and v c , which often is measured in Euclidean space.J i (U, V)-the objective function U and V can be optimized by performing a number of Copyright © 2012 SciRes.OJAcct iterative computations using following Equations ( 4) to (6), whose astringency has been proved where

Research Method
In this section we explain process of data collection and features selection, then we review fuzzy clustering algorithm.

Data Collection and Preprocessing
The database used in this study was obtained Iranian Stock Exchange.Based on the background of Iranian listed companies, the criteria whether the listed company is Specially Treated (ST) by Iranian Stock Exchange is used to categorize financial state into two classes, i.e. normal and distressed.The most common reason that Iran listed companies are specially treated by Iranian Stock Exchange is that they have had accumulated loss to Stockholders' equity more than half (Iran Business law 141 Article).ST companies are considered as companies in financial distress and those never specially treated are regarded as healthy ones.This experiment uses financial data two years before the company is specially treated, which is often denoted as year (t-2) in many literatures.
The data used in this research obtained from Iran Stock Market and Accounting Research Database.According to the data between 2000 and 2009, 70 pairs of companies listed in Tehran Stock Exchange are selected as initial data set.The preprocessing operation to eliminate missing and outlier data is carried out: 1) Sample companies in case of missing at least one financial ratio data were eliminated.2) Sample companies with financial ratios deviating from the mean value as much as three times of standard deviation are excluded.After eliminating companies with missing and outlier data, the final number of sample companies is 120.

Feature Selection
The current study employs 24 variables.The ratios ini-tially selected allow for a very comprehensive financial analysis of the firms including financial strength, liquidity, solvability, productivity of labour and capital, various kinds of margins and profitability and returns.Although, in the context of linear models, some of these variables have small discriminatory capabilities for default prediction, the non-linear approaches used here can extract relevant information contained in these ratios to improve the classification accuracy without compromising generalization.Feature selection is an important issue in bankruptcy prediction, as in other problems where a large set of attributes is available, since elimination of useless features may enhance the accuracy of detection while reducing the amount of time for processing the data.Due to the lack of an analytical model, the relative importance of the input variables can only be estimated through empirical methods.A complete analysis would require examination of all possibilities, for example, taking two variables at a time to analyze their dependence or correlation, and then taking three at a time, etc. This, however, is both infeasible and not error free since the available data may be of poor quality in sampling the full input space.24 financial ratios covering profitability, activity ability, debt ability and growth ability are selected as initial features (see Table 1).

Designing Fuzzy Clustering Algorithm
One another data mining techniques is fuzzy clustering.In fuzzy clustering the fuzzy separation is performed that is each data with one degree of belong is belonged to each cluster.In actual circum stances Fuzzy clustering is very more normal than hard clustering because existing data are not farced fully to depend to one of the clusters in different clusters border and they are separated with a belong degree ranging from 0 to 1, indicating their relation belong.Fuzzy set theory in clustering analysis is focused on fuzzy clustering based on fuzzy relations and objective functions.
With regard to provided explanations the fuzzy clustering algorithm is stated as follow (Table 2).

Research Findings
Fuzzy clustering algorithm has been designed so that in the first stage the data are divided to two distinctive clusters.For this purpose, this technique will determine effective features that cause to the best clustering.Determining effective features is performed by using accidently selection method which it test different fea-tures1000 times to achieve to the best clustering.This algorithm is started with determining an effective feature.On the other hand, this features result in the best clusterng and this trend.i  Summary of research results based on selection feature have been provided in the following Table 4: Will continue until to select all of the features for clustering; summary of results from testing algorithm based on fuzzy clustering by using data in the year of occurring financial distress(t year) have been provided in the following exhibit (Table 3).
where: α 1 : Number of accurately categorized total going concern data/on number of total going concern data.
α 2 : Number of total accurately categorized financial insolvent data/number of total financial insolvent data The nearer the being different of two clusters is the better the clustering it is and there is maximum nonconformity between two clusters.As it is seen between selecting 3 features to 12 features it have been obtained identical percents.That is, in this algorithm selection of two and twelve features for clustering have similar results and there is not any difference between degrees of non-conformity between two clusters.As it is observed the feature 7 (Earnings before interest and taxes to sales) have played an important role in categorizing data and it result in better clustering.With feature 7 the two fuzzy clusters is generated, 93.33 percent (α 1 ) have conformity with going concern group and 100 percent (α 2 ) have conformity with financial insolvent and all errors (β 1 ) is related to going concern cluster.Now, another test is performed to determine degree of conformity for each data (firms) by Iran Business Law Article 141.In this stage, the percent of conformity for two generated clusters by fuzzy clustering with two clusters that have been categorized to going concern firms and insolvent groups according to article 141 is tested.It could be determined their belong percent to each groups.
As it is observed the percent of going concern classification have not been improved as the features increase Results from algorithm test based on fuzzy clustering by using data in the year before financial distress (t-1 year) have provided in the following Table 5.
Another test was performed to determine the conformity for each data (firms) by Business law Article 141.In this test the firms are classified to going concern and insolvent groups.It could be determine their belong percent to each of groups.Summary of research results based on selection feature have provided in the following Table 6.
As it is observed the feature 9 (operating cash to working capital) have played a more important role in data classification and it resulted in better clustering.So that as features increase the percent of classification have not improved but gradually as features increase the clustering have improved until clustering with 15 features result in to classify going concern data with 98.8%.
Results from algorithm lest based on fuzzy clustering by using data in two years before financial distress (t-2 year) have been provided in Table 7.
Now we perform another test to determine amount of conformity between data (firms) by Business Law Article 141.In this test, firms are classified to going concern and insolvent group.It could be determine their belong percent to each of the group.Summary of research results based on selection feature have provided in Table 8.
As it is observed feature 2 (operating cash to total liabilities) is the first important feature for classifying data and it result in better clustering.Results of research indicated that as feature increase the percent of classification is improved until (as long as) clustering with 15 features result in the best classification for going concern data with 96.67%.However, here after as the features increase and including inefficient features to the model result of clustering is reduced.Generally results of algorithm test based on fuzzy clustering indicated that the model in classifying going concern data using data in the year of financial distress, one year and two years before financial distress 96.67%, 83.44% and 77.34% of going concern firms classify correctly respectively and in classifying financial insolvent data this model classify data in the year of financial distress, one and two years before it 100%, 100% and 98.32% respectively.Also, in effective features determination test the results show that in the year of financial distress the features based on leverage ratios (Earnings before interest and tax deduction to interest cost) result in to separate two classes better than before and the more far from incident year we are the more important role the features bases on cash flows (operating net cash flows to working capital or total debt) in clustering tow classes will play.

Geometrical Describe of Belong Percent for Each Firms to Going Concern and Insolvent Classes
As it was stated the fuzzy clustering method is able to determine even belong percent of each one of data to every class so that it is observed this method convey data (x) is belong to going concern class with 80% and belong  to insolvent class with 20% and in some data it is observed that data (z) belong to going concern class with 56% and to insolvent with 44%.Especially this problem matter in the data basses which data of two classes are selected based on pair sampling.To better understand of this problem the belong percents of data is indicated geometrically on the following graphs in the financial distress occurrence year.Horizontal axis show the number of firms and vertical axis show the percent of belong for data to its class.Financial distress data are in the right side and going concern data are in the left side in this axis.The closer the data in its class to top horizontal axis or down are, their percent belong to its class is greater.As it is observed in the above Graphs 1-4 as features increase in the year of financial distress the two classes have separated significantly and with high belong percent are belong (dependent) to their class.
Belong percent of data in year before financial distress in the following Graphs 5-8 states that as features increase the separation is performed more desirably and data show more belong to their class.
Belong percent of data in two years before financial distress in the following Graphs 9-12 is indicated.
Belong percent of data in two years before financial distress in the above graphs states that as features increase the separation is performed more desirably and data show more belong to their class.But should be noticed that the more far from the year of financial distress we are the harder the separation of two classes it is and use of more variables result on variables interference so that it is observed the use of all variables in two years before financial distress have resulted in data would belong to a class with lower belong percent.On the other hand, it could be concluded that the more far from the year of financial distress we are some features have not necessary efficiency for classification and use of all variables would not be correct in the model.

Conclusion
Results of algorithm test based on fuzzy clustering indicate that the model would cluster going concern data by using data in the year of financial distress, one two years be for financial distress with 96.67%, 85.19% and 77.74% respectively for going concern firms.Also, in effective features determination test the results show that in the year of financial distress incident the features based on profitability (earnings before interest and tax deduction to interest cost) would result in to separate tow classes more desirably and the more far from year of financial distress we are, the features based on cash flows (operating net cash flow to working capital or total debt) play more important role in clustering tow classes.

Suggestions for Future Researches
To guide students and researches interested to research in the area of subject of present thesis the following suggestions is provided: 1) Sort the data based on their belong percent in fuzzy clustering to three or four classes.
2) Use different fuzzy clustering method and determine belong percent of samples based on different techniques to going concern and insolvent classes.
3) Compare this method with other techniques such as neural networks method or nearest.
4) Use combination of other variables (different another classes of financial ratios) for designing the model.
consist of determining the number of clusters, amount of repeat parameter, error maximum, belong functions for one data on all clusters.b) k = 1 is clustered by one feature.Features 1 to 24 are aligned randomly.c) Centers of clusters and covariance matrix are determined by using relevant equivalents.d) Amounts of data belong degree to clusters are determines according to related equivalents.e)Therepeat from b) to d) as many as 1000 times to reach the objective function to the best local minimum then algorithm is stopped.f)Selectingeffective better k based on the best result of pre-stated criteria.g)Increasingk and repeating from second step until k = 24 is obtained.

β 1 : 2 :
Number of incorrect data in the first group/number of total incorrect data and β Number of incorrect data in the second group/ number of total incorrect data.

Copyright © 2012 SciRes. OJAcct Graph 9 .
Belong percent of data with one feature.Graph 10.Belong percent of data with four features.Graph 11.Belong percent of data with fifteen features.Graph 12. Belong percent of data with all features.