Applying Information Technology to Financial Statement Analysis for Market Capitalization Prediction

Determining which attributes may be employed for predicting the market capitalization of a business firm is a challenging task which may benefit from research intersecting principles of accounting and finance with information technology. In our approach, information technology in the form of decision trees and genetic algorithms is applied to fundamental financial statement data in order to support the decision making process for predicting the direction of the value of a company with value defined as the market capitalization. The decision process differs from year to year; however, the amount of variation is crucial to a successful decision making process. The research question posed is “how much variation occurs between years?” We hypothesize the amount of variation is smaller than half the number of financial statement attributes that may be employed in the decision making process. We develop a system which tests the amount of variation between years measured as the amount of generations required to reach a target level of fitness. The hypothesis is tested using data filtered from Compustat’s global database. The results support the research hypothesis and advance us toward answering the research question. The implications of this research are the possibility to improve the decision process when employing financial statement analysis as applied to the market capitalization and financial valuation of business firms.


Introduction
The purpose of this research is to explore the amount of variation between years of which financial statement attributes are most critical for determining if the market capitalization of a business firm increases or decreases.We hypothesize the amount of variation is smaller than half the number of financial statement attributes that may be employed in the decision making process.This is tested by applying an information technology based approach to determining which attributes are most critical in the decision process for valuating future market capitalization.
Market capitalization is defined as the total outstanding shares of a company multiplied by the stock's price.The market capitalization-or market cap-is considered the public's valuation of a company.Determining which attributes from fundamental financial statement analysis are valuable in predicting the performance of the stocks is a critical step in any investment strategy.It has long been accepted that financial markets are informationally efficient.This is referred to as the Efficient Market Hypothesis [1].In other words, it is not possible to predict market performance or determine which financial state-ment attributes are the most critical in the valuation of a business firm.In contrast to the efficient market hypothesis, the adaptive market hypothesis [2,3] states that markets evolve based on competition, natural selection, and adaptation.This theory of evolution may be applied to determining which financial statement attributes are most important from year to year.
Decision trees are graph like structures which may be extracted into rules for the decision making process.Decision trees are common in many business domains such as accounting, finance, and operations management.A decision tree may be extracted into a set of rules by following a terminating node of the tree to the root node of the tree.The more levels in a tree the more complex the rule base that may be derived from the decision tree.The decision tree's rules may be extracted and used as input for an expert system or decision support system.Computer scientists have developed techniques which incurporate machine learning into decision trees.This allows the decision trees to be trained on a dataset without human intervention.One of the most popular is the ID3 and the C4.5 decision tree algorithms [4].The C4.5 is merely an extension of the widely popular ID3 algorithm.The machine learning algorithm examines the dataset and employs a strong hill climbing technique and the concept of information gain to determine which attributes are most important in classifying the dataset.
Genetic algorithms are computer algorithms which are frequently applied to optimization problems [5].An organism can be thought of as a set of genes and a gene may be either a single attribute or a combination of attributes depending on the implementation.A genetic algorithm starts with a population or collection of one or more organisms.The algorithm then makes changes to the population's organisms and tests for fitness.Fitness is defined as how well the organism solves a particular problem.Each change of the population's organism(s) results in a new generation.
There is a body of literature related to predicting future returns.One such example is using fundamental financial analysis to predict higher than average returns [6].Another example is what has come to be known as the Piotroski score [7].This method identified 9 specific ratios that could be used to predict above average returns in firms with high book to market values.This work was extended to employ financial statement analysis to predict returns in high book to market firms [8].Next, an information technology approach was developed by applying and a genetic algorithm which applied and modified weights to each of the 9 financial ratios and applied to the Brazilian stock market [9].While decision trees have been employed in accounting, finance, an operations management applying genetic algorithms to increase the accuracy of decision trees was conducted by [10].Research has also applied an information technology approach to financial distress prediction [11].

Methodology
The null and research hypotheses to be tested in this research are: 1) H 0 = Based on the efficient market hypothesis the attributes required for valuation will differ from year to year by at least half the total number of attributes.
2) H 1 = Based on the adaptive market hypothesis the attributes will naturally evolve and will differ from year to year by less than half the total number of attributes.
The dataset was selected from Compustat's global database.The data was then filtered to include data from the years 2000 through 2006.Only companies from GBR were selected in the dataset to avoid anomalies arising from local variations such as currency exchange rates.Only companies that remained active were selected.From this 66 Compustat [12] attributes were retrieved which could be extracted from financial statements and computed for each stock with each stock identified by a unique identifier.The attribute which most accurately reflects the performance of a business firm for the pur-poses of this study is called "pricediv" which is computed as the price + dividends.The reason this attribute was chosen as the target is price + dividends have a large effect on the market cap of a business firm.The datasets contained a minimum of 676 records and a maximum of 1129 records with an average of 877 records.The reason for the differences was an increase in publically traded companies during the range of years (2000)(2001)(2002)(2003)(2004)(2005)(2006).
The C4.5 decision tree algorithm was trained on data from year k to predict the pricediv from year k + 1.For example, attributes from year 2000 were used to predict the pricediv of year 2001.This decision tree was the single organism in the population for a genetic algorithm.The genetic algorithm then randomly mutated this decision tree.The resulting decision tree was then tested against attribute data from year k + 1 to predict the p1pricediv for year k + 2. For this experiment fitness is defined as percent classification accuracy.The best possible fitness would be from a C4.5 decision tree created on data from year k + 1 to predict pricediv from year k + 2. The genetic algorithm then ran until fitness was achieved.The number of generations and therefore mutations/changes was recorded in order to test the hypothesis.

Results
The results of the experiment show that less than 9 generations were necessary to reach fitness.This was much less than the null hypothesis which stated an average of 50% of the total attributes would be required to reach fitness.To reiterate, fitness in our experiment is the classification accuracy of a decision tree built with the C4.5 machine learning algorithm for the target year.Based on the results the null hypothesis is rejected and the alternative hypothesis accepted.Table 1 illustrates the generations required to reach fitness (classification accuracy).
As stated, the results indicate the level of volatility is less than one may have expected.There are many factors that may have influenced such a conclusion.First, it is Copyright © 2013 SciRes.OJAcct possible that many financial attributes do not aid in classification of stocks as a whole.These attributes may be better suited to classifying a specific industry.Second, it is possible that only certain attributes are actually useful when classifying stocks utilizing a decision tree.There are certain financial attributes that have long been recognized as good metrics of a company's performance such as EBIDA.Third, it is conceivable that classifications may be highly influenced by macroeconomic factors such as inflation or international monetary fund attributes.These macroeconomic factors may influence which attributes are helpful or be strongly correlated with certain attributes thereby causing attributes correlated with the IMF to become valuable attributes in stock classification.

Conclusions and Future Directions
Determining which attributes from financial statement analysis for predicting the direction of market capitalization is a daunting task.The efficient market hypothesis would lead us to believe that there is a large variation between years on which attributes are important in predicting market capitalization.We have demonstrated that the variation between years is smaller than half the total number of financial statement attributes available for determining future market capitalization.In fact, there was a variation of less than 9%.This study is limited by the amount of attributes applied to this task.Future research will address this limitation.Additionally, the dataset employed in this study was limited to a single country in an established market.Future research will employ both additional countries as well as emerging markets in order to draw conclusions between established versus emerging markets as well as between countries in the same category.Finally, this study was limited by the number of years incorporated into the datasets which will be addressed in extensions to this work.The implications of this research apply to a broad audience who are interested in fundamental financial statement analysis and market capitalization valuation.