A Deep Dive: Does Big Data Improve Maturity in the Developed Capital Markets?

Over this decade, the concept of big data has been applied to industries but the capital markets have been traditionally laggard to adoption. Within the financial services’ sector, Big Data has gained far more traction within retail banking and insurance due to the increasing desire of these financial institutions to profile and analyze their customers in a similar manner to early adopters of Big Data strategy such as Amazon, Baidu or Google. However, Big Data strategies have begun to make some impacts on few selected areas of the capital markets, including the social media sentiment analysis on the structured and unstructured data for trading, growth in volume, risk analytics, fraud prevention, market surveillance, predictability and forecasting of the equity prices; those are the early sign of the maturity of the capital markets. Technical and theoretical measures have evolved, but still these dimensions of the capital markets have been a mystery for the human beings till now. The Big Data in the form of structured, semi-structured and unstructured socio-economic and demographic information from social media and blogs from consumers has started indicating impacts on the capital markets which can lead to improving the real-time systems and transaction processing,and improving operational efficiency and maturity. The intent of this paper is threefold. First, it aims to bring the clear inference from the past researches to take a holistic analysis of the work done in the emerging area of Big Data and its implications on capital markets. Second, it’s to perform a deep analysis on how the influences of Big Data affect the assumptions in connection with Random Walk theory and Efficient Market Hypothesis. Third, it will provide a conclusive theoretical analysis of past research work by the scholars, which can establish the model to refine the nexus between investors’ sentiments and assets’ prices with advanced techniques in the Big Data. The paper has been divided into 4 broad sections. In the first section, the paper sets the introduction of connecting the dots and setting the context for the two different fields like Big Data and its influences on the capital markets. The second section explains the theoretical premises and frameworks needed for this research and does deep studies of the previous works in this area to establish conclusive references for the future study. The third section carries out the studies of emerging social media and technologies, analysis of the previous research works from the social media and the capital markets perspective. Finally, the fourth section concludes findings with recommendations.


Introduction
A capital market is a part of the financial system in which debt or equity backed securities are bought and sold. The buying and selling of the equities are generally carried out by the individuals or the institutions. As a trend, these trades are generally long term in nature, which have locks in a period of more than a year. The capital market is a much broader landscape which also includes stock markets which are used to trade the securities. In one of the papers (Hilbert [1] on Big Data for development), he stresses that Big Data delivers the very cost-effective prospect to improve decision-making in critical economic development areas such as economic development, health care, security and resource management through effective decision making process. The use of Big Data to improve the effectiveness of decision-making process also contributes to the theory of economic development. At the verge of big data, crossing Moore's chasm as the capital markets has started focusing on the unstructured data for new avenues for innovation and has offered immense opportunities for growth and sustainability.
The concept of "Big" in a retail or scientific environment is different from what is considered to be big in a capital market context. The capital markets tend to deal largely with structured data sets from a more limited set of defined sources-market data vendors, market infrastructures, and counterparties. Some unstructured datasets have, however, become important to capital markets institutions in areas such as sentiment analysis, market surveillance for profiling certain trends and fraud prevention, improving the predictability, maturity and activities in the market and within firms. But these were not traditionally being the data sets of primary importance to the business. There has always been a gap or chasm among innovators, early adopters and laggards for the adoption of discontinuous or disruptive technologies, and that's why Big Data is today in capital markets.
One of the milestone finance theoretical premises published by Fama and Malkiel in Journal of Finance in 1970 [2], with the title "Efficient Capital Mar- patterns for the stock market. Fama [3] suggested three levels of the market efficiency "Weak-form" efficiency were security prices, reflecting all information found in the record of past prices and volumes. Prices in "Semi-strong form" of efficiency reflected not only all information found in the record of past prices and volumes but also all other publicly available information whereas prices in "Strong-form" efficiency reflected all the available public information as well as private information. In another milestone paper "Market Efficiency, Long-Term Returns, and Behavioral Finance" in 1998, Fama [4] observed that in line with the market efficiency hypothesis that the capital market anomalies were chance results, the apparent overreaction to information was as common as under reaction, and post-event continuation of pre-event abnormal returns was about as frequent as post-event reversal. He also mentioned that in line with the market efficiency prediction that apparent anomalies could be due to methodology, most long-term return anomalies tended to disappear with reasonable changes in technique.
In the recent days, massive eruptions of the information flow, technological changes and availability of structured and unstructured data for consumption have generated a fresh interest in looking into the efficiency and maturity of the capital market from the fresh perspective. This brings a great intersection of the theoretical premise of the efficient market hypothesis (EMH), Random walk theory [5], circulation of the massive data flow and its consumption through the newer technologically landscape of Big Data. To support this finding, previous scholars such as Singh [6] concluded that adoption of new technologies such as Big Data could drive efficiency and promote productivity. Singh [7] and Singh [8] explained that R&D helped to create a vehicle to drive these synergies as technology spillovers. To explain their findings, Singh [9] argued that large organizations needed to adopt Big Data aligned information technology strategies in order to reduce costs and their reliance on large manual manpower. One of the strategies for the merger is driven by the technical advancements (Singh [10] and Singh [11]), which can be further refined and matured through Big Data strategies.

Foundation
The two basic principles of the financial world are foundation and basic premise to carry out the study further Attigeri [12]: 1) Profit cannot be generated out of nothing and 2) No opportunities for arbitrage i.e. there is no potential to generate profit without risk which is also called "No Arbitrage principle".  [15], accompanying development of the mathematical theory of random processes in reference to Einstein famous work on the Random Brownian Motion of colliding gas molecules 1 . Kendal had expected to find the regular price cycles but he was surprised to observe that they did not seem to exist and he conceived the thought of "Random Walk". The Random walk theory [5] and Attigeri [12] stressed that changes in the stock prices are independent of each other, so the past movement or trend of a market cannot be used to predict the future movement. It is expressed as where v(n): price of a stock at a time n and ( )  There are various misconceptions about the efficient market hypothesis one of the very prominent one is that efficient market theory implies that market has perfect forecasting abilities but in reality this hypothesis only implies that security prices impound all available information which does not mean that the market possesses the forecasting abilities.  [17] mentioned that there is contradiction between the quantity theory of money and capital market efficiency. The relationship between the money supply and equity prices refutes the other findings of the capital market efficiency. Using the quantity theory of money, Sprinkel [18] finds that money supply changes could be used to predict the stock prices. However if the capital market is efficient, past information like money supply cannot be used for predicting the stock prices.
Timmermann and Granger [19] mentioned that the efficient market hypothesis gives rise to forecasting tests in the context of a given past information set.
However, there are also important differences arising from the fact that market efficiency tests rely on establishing profitable trading opportunities in "real time". As the Forecasters continuously try for the predictable patterns and affect prices when they attempt to exploit trading opportunities. Stable forecasting patterns are therefore unlikely to persist for long periods of time and will self-destruct when discovered by a large number of investors. This gives rise to non-stationarities in the time series of financial returns and complicates the formal tests of market efficiency and the search for successful forecasting pattern.
Based upon all the arguments and discussion, it seems that Efficient Market Hypothesis like all theories is an imperfect and provides limited description of the stock market. However at least for the present, it seems to be better alternative. This hypothesis is too broad and too flexible to be rejected. The EMH hypothesis has to deal with the predictability of the equity prices in the capital/financial market

Capital Market and Economy
In the current landscape of the economies, the world's economy could be broadly divided in three categories 3 based upon the income in three categories Low income, Middle income and High income. In the similar way, the world's capital markets could be divided in three broad categories based upon the efficiency of the capital markets like Weak form of efficiency, Semi strong form of efficiency and Strong form of efficiency. In general, the developed economies like USA shows the characteristics of the strong form of the efficiency however the developing economies like India shows the characteristics of semi strong form of the efficiency.

Social Media and Emerging Technologies
The big data strategy and algorithms does use the power of high capacity com- Within the financial services sector, Big Data has gained far more traction within retail banking and insurance sector due to the increasing desire of these financial institutions to profile their customers in a similar manner to early adopters of Big Data strategy such as Amazon, Baidu or Google. Big data is defined as, "data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data" (Snijders et al.) [26]. However, Big Data strategies have begun to make significant impact in a selected few areas of the capital markets including the sentiment analysis for trading and growth in volume, risk analytics, fraud prevention and market surveillance. The global financial markets were fragmented due to rapid globalization and technological changes (Funk et al.) [27]. In one of study on liquidity, Blocher et al. [28] identified three key components of the financial market like 1) fund management for long term investors, 2) low-frequency trading (LFT) by the traditional brokers and 3) and high-frequency trading (HFT) by proprietary financial firms which used the Big Data and aggregated information to set the trading strategy. The capital market industry has varied data source, which included structured data like traditional banking transaction data and market data which is also called the "system of records". At the same time, it also generates mammoth volume of unstructured data through corporate news, feeds, blogs, micro blogs, macro and micro economic indicators, social media updates and contents. In recent years, capital markets have gone through an unprecedented change, resulting in the generation of massive amounts of high-velocity and heterogeneous data. Similar trends could be observed in the financial services sector as well, where Big Data has been increasingly becoming the most significant, promising, and differentiating asset for the financial services enterprises (Seth et al.) [29]. These massive data troves could be processed through Big Data strategy, tools and techniques which could be game changer. Traditional tools cannot process such large datasets, however the Big Data based approach can analyze structured and un-structured data and create logical pattern to help business to take decisions. The speed and agility of this processing is exponentially faster than that could be done by the traditional mechanism. This provides near real time information with the actionable intelligence which could be used in the decision-making process. The analytical and predictive power of information generated from online big data for the capital market activity is supported by numerous studies ranging from stock markets to housing markets.
Big Data strategy could help the capital market to address key use cases in the area of 1) Trading Strategy, 2) Reporting, 3) Compliance, 4) productivity improvement and maturity and 5) Operational simplifications. Investment banks Electronic copy available at: https://ssrn.com/abstract=3402658 limitless possibilities to gain informational advantage over the competition by cleverly analyzing public data sources. Big Data strategies in the capital markets tend to be synonymous with analytical tasks or those related to reporting or governance functions but in the in the recent years the consumption of text-based, audio & video unstructured data had also been a significant driver for some projects. Regulatory, client, and internal drivers have forced most firms to re-evaluate the core reference data sets on which they are basing their trading, risk management, prediction, forecasting and operational decisions.

Capital Markets and Emerging Technologies
Big Data had been a much misused and misunderstood term within the financial

Big Data and Capital Market Efficiency
In the world of finance, the Efficient Market Hypothesis (EMH) asserts that financial markets were "informationally efficient" which means current stock prices were already reflecting all known information (structured or unstructured) and all occurred events and facts. Therefore, investors cannot make excess profits from the market if their trading strategies are based on known information, because market prices are efficiently collecting and aggregating various information and keep changing without delay (Zhang and Skiena) [30]. Enterprise headquarters faces a tradeoff between the cost of attaining an accurate private information and the value of the information, this could lead to the inefficacy in the capital market. Foster [31] studied the efficiency and inefficiency of the capital market and explained the influence of information on the capital market. Fama [32] made the simple hypothesis that security prices fully reflect all available information considering the precondition of that information and trading costs, the costs of getting prices to reflect information, are always zero. It is clear that those who arbitrage make no return from their costly activity in case market is ideally information efficient. Hence, the assumptions that all markets, including that for information, are always in equilibrium and always perfectly arbitraged are inconsistent. In the advent of structured and un-structured information flow in the capital market through the social media, the role of information becomes even more relevant. In case the access of information with all the actors in the capital market not in equilibrium, then it leads the market toward arbitrage and makes less efficient. However effective usage of the Big Data information could reduce the opportunity of arbitrage and make it more efficient. Fama [32] was also of the opinion that predictability of stock returns based upon the dividend yields is not in itself evidence for or against market efficiency. While understanding the behavior of the capital market, Sharpe [33] mentioned that to predict the behavior of capital markets in the state of equilibrium, capital asset prices have adjusted so that the investor, if he follows rational procedures through diversification is able to attain desired point along a capital market line. Investor should be able to absorb all the available information and adjust and diversify its portfolio so that he could attain the desired goal along the Capital Market Line (CML).
It was difficult to observe the investors views on the capital markets and understand the decision-making process before the advent of the Big data. With the ubiquity of information technology and the Internet, an increasing number of investors are gathering information from the Internet for analysis. Ye and Li [34] [35] aggregated the opinion from individual tweets to successfully predict a firm's forthcoming quarterly earnings and announcement returns. In order to study the impact of the microblogging, Jin et al. [36] observed a significant impact on the increase in the relative trading volume as well as the decreases in the daily expected stock return and firm-level volatility. In similar lines, while studying the impact of the financial news in the media Nagata and Inui [37] observed that there is intrinsic impact on the movements of the financial markets. While Big Data is spreading, a very convincing evidence is observed by Moat et al. [38], which infers that online data and information can give new insights into real-world which could affect the collective decision making and can even anticipate future actions in the capital markets. In the field of Behavioral economics Bollen et al. [39] observed that emotions can profoundly affect the individual behavior and decision-making which could even be extended to even predictive of economic indicators. Zhang et al. [40]  When people are pessimistic or uncertain about the future, they will be more cautious to invest and trade. So, capturing the collective mind-especially people's mood becomes one possible way to predict the stock market movement.
While studying the human emotional and its influences on the capital market, Sharpe [33] observed that human state of emotion could influence decisions even in the capital market and the investment decision.

Concluding Remarks
The theoretical premises starting from the Random walk theory and Efficient Market Hypothesis indicate that predictability of the stock prices, creating a pattern in the movement of capital markets based upon the past movement is a very complex task. All the studies in the past related to impacts of the Big Data and its influences on the capital markets indicate that information asymmetry can be reduced to a significant level. However, many other macro variables like availability of the internet to all the investors, speed, access to the relevant information, costs of information and biasness in the content on the social media can provide opportunity for inefficiency in the market. Most of the scholars acknowledge during their researches and analyses that Big Data has started playing significant influences on the capital markets and decision-making process. It will further evolve and use the continuous flow of information through defined and non-defined sources such as social media, microblogs as part of their Big Data strategy to do the sentiment analysis and improve the decision moments to improve the maturity and minimize the inefficiency in the capital markets. In this context, the fundamental analysis based upon the Big Data flow from the industry draws a strong attention to carrying out further researches in the perspective of Random Walk theory, theory of economic development and Efficient Market Hypothesis.