A Short-Term Stock Exchange Prediction Model Using Box-Jenkins Approach


This paper developed a short-term stock exchange prediction model using the Box-Jenkins approach. In this study, monthly data from Ghana Stock Exchange market report that spans from March 2013 to February 2018 were used to develop the model. ARIMA (0, 2, 1) model was fitted to the data based on the Bayesian Information Criterion (BIC) for model selection. Diagnostic checks showed that the residuals of the fitted model were uncorrelated. The developed model was used for forecasting for a period of six months. The trend of the forecasted values showed a significant increase in the Ghana Stock Exchange performance for the next six months.

Share and Cite:

Boye, P. and Ziggah, Y. (2020) A Short-Term Stock Exchange Prediction Model Using Box-Jenkins Approach. Journal of Applied Mathematics and Physics, 8, 766-779. doi: 10.4236/jamp.2020.85059.

1. Introduction

A stock exchange market is the center of a network of transactions where buyers and sellers of securities meet to provide a clear indication of the market price for each investment. The exchange also plays a key role in the mobilization of capital from shareholders for companies in exchange for shares in ownership to investors in emerging and developed countries. This leads to growth of industry and commerce of the country; and this is a consequence of liberalized and globalized policies adopted by most emerging and developed governments [1] [2] [3] [4].

Even though the stock exchange markets have been classified as the most volatile in the world and are full of anonymity and escapade performances [5], stock investments are one of the various investment options which has become very attractive to both foreign and local investors due to ease of access to the stock market and the expectation of high rate of returns [6]. In a stock market, financial information is one of the key elements among several factors (e.g. financial policy, monetary policy, foreign trade policy and macroeconomic factors) that influence the stock prices and inform the investors whether to invest their savings in a company’s stock or otherwise [6] and [7].

In the stock exchange market, it is known that changes in the stock prices as well as the returns may be attributed to various prevailing risks and events such as economic crisis, natural disasters, movements in international oil prices, inflation effects, foreign exchange rates, changes in government policies, regulations and norms occurring within a country and across the world [8]. Hence, the study of stock market price volatility has been a subject of interest in finance and econometrics. The study of these price changes has become relevant in the context of quantitative analysis, financial time series modelling, volatility assessment and risk analysis [9]. In addition to that, these occurring variations have necessitated the need to investigate the determinants of the stock market performance, analyse the factors causing the variations in the performance indicators, formulate mathematical models that can best fit the performance indicators, explain the underlying behavioural patterns and forecast these indicators using appropriate dataset.

For years, the relationship between financial sector development and real economic activity has been a debatable issue in theoretical and empirical research [10]. Reference [10] argued that well-functioning financial systems encourage technical innovations by reallocating resources to the entrepreneurs and promote economic growth. This debate revolves around whether stock price movements are influenced by economic changes or stock market performance helps in promoting economic growth. In this regard, questions under consideration are: is there a relationship between financial sector development on economic growth and the identification of causal nexus between economic growth and financial development?

Due to the importance of accurately forecasting stock exchange prices, various forecasting methods have been applied in literature. These methods can be grouped into three categories as artificial intelligence, multivariate analysis and time series models. Artificial intelligence methods such as artificial neural networks are advance computing tools that have recently been applied to time series forecasting. Although very good forecasting performance is given, their forecasting results depend on many factors such as large training data points, extensive training period to reach convergence and data partition technique used. In the case of the multivariate analysis the forecasting results rely on the independent variable(s) employed into the modelling and avoidance of multicollinearity. In the analytical time series, a good forecasting result is achieved on condition that the data being analysed is stationary [11] and [12].

References [13] [14] [15] [16] respectively recounted on this issue and developed Regression Models (RMs) to determine the relationship between the stock market performance and its macroeconomic determinants. However, according to [17], empirical results are still debatable due to the inconsistency of the macroeconomic determinants employed in the model’s formulation. To avoid the difficulty of which macroeconomic determinant(s) to be employed into the RMs, [18] argued that the stock price or returns mimics a random walk hypothesis and it is a difficult task to predict or forecast the accurate future returns; but numerous studies in the area of stock returns prediction or forecasting have dedicated on the usage of classical statistical methods (ARIMA) which has dominated the field of financial dataset as a popular choice model that can be used to model the accurate future stock price [19]. In this regard, this study employs the Box-Jenkins approach as an alternative to the RMs in stock market researches.

An example of the stock market which requires attention is the Ghana Stock Exchange (GSE). The GSE plays an important role in the economic development of Ghana and its corporate finance. It is a well-known fact that, an organised and well managed stock market stimulates investment opportunities by recognizing and financing productive projects that would lead to real economic activities. Reference [20] affirmed this assertion and showed from their study that there exists a strong positive relationship between stock market development and economic growth.

Since systemic risk in GSE performance hugely affects stock market investments and the country’s economic development, this study seeks to develop a time series model based on Box-Jenkins approach to help capital investors to identify the trend in the GSE and to forecast them appropriately.

Related Works

Reference [21] investigated the dimensionality and expectancy of a naïve investor. The authors used historical dataset of four India midcap companies for training the ARIMA model. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) tests were applied to select the best accurate prediction model. The formulated prediction model was tested on individual stocks and Nifty 50 Index. It was observed that the Nifty Index is the way to go for Naïve investors because of low error and volatility.

Reference [22] studied the relationship between some macroeconomic variables (exchange rate and oil price) and stock price in the following emerging countries: Brazil, China, India and Russia. The monthly data that spans from March 1999 to June 2006 were analyzed using Box-Jenkins approach. Results showed that there was no significant relationship between the oil prices and exchange rate over the stock market of the emerging countries. As a result, weak form of market efficiency exists in those capital markets.

In Thailand, [23] examined the stock market to find out the relation between the following selected macroeconomic variables: money supply, exchange rate, oil prices, industrial production and share price index by performing time series analysis. It was concluded that money supply positively affected stock prices while exchange rate negatively influenced stock prices.

Reference [24] studied the trends, similarities and patterns in the activities and movements of the Indian Stock Market in comparison to its international counterparts. The time period was divided into various era to test the correlation between the various exchanges to prove that the Indian markets had become more integrated with its global counterparts and its reaction were in tandem with that seen globally.

Reference [25] analyzed extensive process of building ARIMA models. To identify the optimal model, the authors employed the following criteria: standard error of regression, adjusted R-squared and Bayesian Information Criterion (BIC). Based on the mentioned criteria, the best ARIMA model did satisfactory job in predicting the stock prices of Nokia and Zenith Bank. In addition, the authors made strong argument of the forecasting potential of ARIMA models in terms of stock analysis because it could compete reasonably well against the emerging modern forecasting techniques for short term prediction.

2. Resources and Methods Used

2.1. Resources

The study used two main resources:

1) Monthly data that spans from March 2013 to February 2018 obtained from Ghana Stock Exchange Market Report (Table 1); and

2) R Statistical software.

Table 1. Ghana Stock Exchange (GSE).

Source: Ghana Stock Exchange Monthly Market Report, 2018.

2.2. Methods

2.2.1. Linear Model

In this study, the Ordinary Least Squares (OLS) technique was used to fit a regression equation to the GSE time series data. The essence according to [26] and [27] is to find whether the time series data (i.e. GSE) exhibits linear trends.

Knowledge of the linear trend projection enables the modeller and the user to:

1) Describe historical trend patterns;

2) Permits the projection of past pattern of trends into the future; and

3) Eliminate the trend component from the time series data.

Consider the Simple Linear Regression (SLR) given in Equation (1).

y t = β 0 + β 1 t (1)


y t = Ghana stock exchange value.

β 0 = fixed composite index at t = 0 .

β 1 = unknown parameter to be determined from data.

t = monthly duration (time in trend analysis).

From OLS method that minimises the sum of squares errors, Equations (2) and (3) are obtained as follows:

β 1 = n y t y t n t 2 ( t ) 2 (2)

β 0 = y n β 1 t n (3)


n is the sample size.

Hypothesis Testing

The hypothesis for the study is formulated as follows:

H0: β 1 is zero.

H1: β 1 is different from zero.

2.2.2. ARIMA Model

According to [28] and [29], Box-Jenkins Autoregressive Integrated Moving Average model consists of the Autoregressive (AR (p)) model and the Moving Average (MA (q)) model. When these two models are put together, the Autoregressive Moving Average (ARMA (p, q)) model is formed.

ARMA processes form the core of time-series analysis. According to [30] and [31], the first order moving average, abbreviated as MA (1), is the simplest non-degenerated time-series process defined in Equation (4).

y t = ϕ 0 + ϕ 1 ε t 1 + ε t (4)


ϕ 0 and ϕ 1 are unknown model coefficients whose actual values would be determined from data, and ε t is a white noise process.

The first order autoregressive abbreviated AR (1) has the following dynamics (Equation (5)):

y t = θ 0 + θ 1 ε t 1 + ε t (5)


θ 0 and θ 1 are the unknown model coefficients whose actual values would be determined from data. ε t is a white noise process. An Autoregressive Moving Average process with orders P and Q; ARMA (P, Q) has the following dynamics (Equation (6)):

y t = θ 0 + θ p y t p + ϕ q ε t q (6)


1) The ε t is independent identically distributed.

2) ε t ~ N ( 0 , σ 2 ) .

Hypothesis Test

The hypothesis for the study is formulated as follows:

H0: Series is not stationary

H1: Series is stationary

3. Results and Discussion

In formulating the OLS model, a statistical description of the data (Table 1) was performed by using R statistical software version 3.6.1 to find the existing relationship among them (see Table 2). Table 2 shows the descriptive statistics summary results. The data size is 60 and the maximum and minimum GSE values are 3337.2 and 2113.58 respectively. The corresponding standard deviation (Equation (7)) value is 312.14. This implies that most of the GSE data points are spread out and they are far from the mean value. The positive value of the skewness (Equation (8)) (Table 2) implies that the distribution of the data set is skewed to the right (positively skewed). The interpretation here is that the right tail of the GSE data set distribution is longer than the left tail. This means that the GSE data set is heavily concentrated on the left tail of the distribution curve. Hence, providing a measure of the asymmetry of the probability distribution of the GSE data set about its mean value. The kurtosis (Equation (9)), the pointedness of the data distribution, value of 3.75 indicates that the distribution of the data is leptokurtic.

s n = 1 n i = 1 n ( y i y ¯ ) 2 (7)

g = i = 1 n ( y i y ¯ ) 3 ( n 1 ) s n 3 (8)

k = i = 1 n ( y i y ¯ ) 4 ( n 1 ) s n 4 (9)


y t = Ghana Stock Exchange Value.

y ¯ is the mean value of the Ghana Stock Exchange Value.

n is the sample data size.

Consequently, from the analysis of the GSE using Equations (1), (2) and (3), the linear model was developed (Equation (10)).

y t = 2035.833 + 2.549 t (10)

Analysis of variance (ANOVA) test was then performed to find the significance of the developed model (Equation (10)) coefficients (see Table 3). From Table 3, the critical F-value is 1.204 and from the standard F table, F (k − 1, n − k, α) = F (1, 58, 0.05) = 4.01.

Since Fcritical < Fcomputed , the null hypothesis is accepted; and it was concluded that the estimated β1 is not statistically significant at 5% level of significance. Thus, at 5% level of significance, there exists no relationship between GSE and time. Hence, instead of developing linear regression analysis model, time series analysis model was resorted to instead.

Time series plot and Augmented Dickey-Fuller (ADF) nonstationarity test were performed to verify the nonstationarity of the GSE data which could have caused the generation of wrong model parameters if not corrected. Figure 1 shows time series plot for GSE data used to verify the stationarity of the data. It shows sudden changes in trends which attest that it is not stationary.

The Augmented Dickey-Fuller (ADF) stationarity test performed on the data. The test gave a p-value of 0.99 which is greater than α = 5% level of significance. Hence, the null hypothesis that the series is not stationary is accepted.

Graphical plots such as Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) were further carried out to confirm the nonstationarity of the data. This can be seen in Figure 2 and Figure 3. The ACF plot of Figure 2 shows a sine-wave pattern with decaying strong spikes which confirms that the series is not stationary. The PACF of Figure 3 has one significant lag with the rest decaying; which is also an indication of nonstationarity of the data. Since Figure 2 and Figure 3 show that the GSE data is not stationary, it was differenced once (see Figure 4).

Figure 4 shows first difference GSE which does not appear to be stationary

Table 2. Descriptive statistics test summary results.

Table 3. ANOVA test summary statistic results.

Figure 1. Time series plot for GSE.

Figure 2. ACF plot for GSE.

Figure 3. PACF plot for GSE.

due to the presence of an upward movement. As a result, ADF test was performed to confirm the claim.

The ADF test shows that the differenced data was not stationary since the p-value = 0.6235 was still greater than α = 0.05 significance level. Therefore, the first differenced data was differenced again (see Figure 5). Figure 5 shows a time series plot for the second differenced GSE data which appears to be stationary since there are no upward trends as the year progresses and the variations of the amplitudes are equal.

Figure 6 and Figure 7 show the ACF and PACF plots for the GSE second differenced data. From the ACF plot, the autocorrelation at lag 1 exceeds the

Figure 4. First difference plot for GSE.

Figure 5. Second difference time plot for GSE.

Figure 6. ACF plot for second differenced.

significance bounds, but all other autocorrelations are below the significance bounds. The PACF on the other hand, shows that the partial autocorrelations at lags 1, 2 and 5 exceed the significance bounds and are slowly decreasing in magnitude with increasing number of lags. Clearly, from these plots, MA and AR terms are respectively identified. Since the ACF plot (Figure 6) of the second differenced series cuts off after the first lag, MA (1) was assumed and resulted in IMA (2, 1). The PACF plot (Figure 7) of the second differenced series on the other hand tailed off after lag 2 and cuts off after lag 5. As a result, MA (2) and AR (5) were formed. Consequently, mixed models ARIMA (5, 2, 1) and ARIMA (5, 2, 2) were formed by combing the AR and MA terms.

The ADF test shows that the second differenced data is stationary since it has a p-value of 0.01 which is less than α = 0.05 significance level and that confirms the claim of a stationary time series. Consequently, an ARIMA (p, 2, q) model is probably appropriate for the GSE data.

After the model identification, Bayesian Information Criterion (BIC) as well as the coefficient of determination, R2, were used for the selection of the reliable model. Table 4 shows ARIMA model selection summary results of the BIC and R2 values. The R2 is a model goodness of fit measure of prediction accuracy. From Table 4, the ARIMA model with the smallest BIC and R2 values of 704.5556 and 0.9010 respectively is ARIMA (0, 2, 1); and it was selected as the best model that fits the GSE data well. Thus, the autoregressive order p is the lag value after which the PACF plot crosses the upper confidence interval for the

Figure 7. PACF plot for second differenced GSE.

Table 4. ARIMA model selection summary results.

first time. In our case, the PACF plot of the second differenced GSE graph (Figure 7) did not cross the upper confidence interval at any lag value. As a result, the p value was 0 and the integrated value was 2 since the GSE data was differenced twice. On the other hand, the moving average process of order q was obtained by using the ACF plot. Thus, it is the lag value after which the ACF plot crosses the confidence interval for the first time. From Figure 6, it can be seen clearly that after lag 1 the ACF graph crosses the lower confidence interval for the first time. Consequently, the q value was 1.

ARIMA (0, 2, 1) model explained about 90% of the total variation in the composite index data set.

Figure 8 shows the checked ACF residuals for GSE second differenced data. From the plot, almost all the lags are below the significance bounds which is an indication that there is no autocorrelation in the residual. This suggests that all the information in the GSE second differenced data used for the modelling has been accounted for by the model.

Consequently, the ARIMA (0, 2, 1) model (Equation (11)) to be used for forecasting was formulated.

y t = 2 y t 1 y t 2 + ε t + 0.7409 ε t 1 (11)

Equation (8) was used for six-month monthly forecast of the GSE. Table 5 shows the forecasted GSEV for the next six months using the developed ARIMA (0, 2, 1) model. In Table 5, it can be deduced that the forecasted values show a significant increase from March 2018 to August 2018. This assertion can

Figure 8. ACF residuals plot for GSE.

Table 5. Forecast values summary results for GSE.

Figure 9. Forecast plot for GSE.

additionally be confirmed from Figure 9 where a graphical illustration of the forecasted values has been presented. In Figure 9, the six-month forecast is shown in blue line. The dark ash blue shaded area shows 80% to 100% prediction intervals.

4. Conclusions and Recommendation

In this paper, ARIMA (0, 2, 1) model has been developed from the observed GSE monthly market report data over a period of five consecutive years to predict future stock exchange prices or returns. In developing the ARIMA (0, 2, 1) model, nonstationarity which existed in the GSE sample data and could have caused wrong statistical inferences was resolved by differencing the data twice to ensure that the data is stationary. A confirmatory test to verify the stationarity of the GSE data was also carried out using the widely known Augmented Dickey-Fuller (ADF) test.

Diagnostic check was performed by using ACF residuals plot for GSE second differenced data to ensure that there is no autocorrelation in the residuals. This suggests that all the information in the GSE second differenced data was used for the model development.

ACF and PACF plots were used to determine the appropriate ARIMA developed model. After the model identification, Bayesian Information Criterion (BIC) as well as the coefficient of determination, R2, was used for the selection of the reliable model. Consequently, the corresponding R2 of the developed ARIMA model explained about 90% of the total variation in the composite index. The developed ARIMA (0, 2, 1) model was used for forecasting for a period of six months and the trend of the forecasted values showed a significant increase in the GSE. In conclusion, the ARIMA (0, 2, 1) is a good model that can be relied upon by companies and investors to predict accurate future stock prices or returns.


The authors are thankful to Ghana Stock Exchange for providing us with the necessary data for this study to be a success.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.


[1] Aurangzeb (2012) Factors Affecting Performance of Stock Market: Evidence from South Asia Countries. International Journal of Academic Research in Business and Social Sciences, 2, 1-15.
[2] Jagongo, A. and Mutswenje, V.M. (2014) A Survey of the Factors Influencing Investment Decisions: The Case of Individual Investors at the NSE. International Journal of Humanities and Social Sciences, 4, 92-102.
[3] Sedighi, M., Jahangirnia, H., Gharakhani, M. and Fard, S.F. (2019) A Novel Hybrid Model for Stock Price Forecasting Based on Metaheuristic and Support Vector Machine. MDPI, 4, 1-28.
[4] Sassar, T.S. (2019) Analysis of the Effect of Sharia Stock Trading Activity Factors and Macroeconomic Factors on the Performance of Sharia Stocks in the Capital Market in Indonesia. International Journal of Tax Economics Management, 12, 11-28.
[5] Sadaqat, S., Akhtar, M.F. and Ali, K. (2011) An Analysis on the Performance of IPO: A Study on the Karachi Stock Exchange of Pakistan. International Journal of Business and Social Sciences, 2, 275-285.
[6] Anwaar, M. (2016) Impact of Firm’s Performance on Stock Returns (Evidence from Listed Companies of FTSE-100 Index London, U.K.). Global Journal of Management and Business Research: D Accounting and Auditing, 16, 31-39.
[7] Shah, D., Isah, H. and Zulkernine, F. (2019) Stock Market Analysis: A Review and Taxonomy of Prediction Techniques. International Journal of Financial Studies, 7, 1-22.
[8] Reddy, Y.V. and Narayan, P. (2016) Literature on Stock Returns: A Content Analysis. Amity Journal of Finance, 1, 194-207.
[9] Kallah-Dagadu, G. (2013) Modelling Ghana Stock Exchange Indices and Exchange Rates with Stable Distributions. Mphil Dissertation, University of Ghana, Ghana.
[10] Reddy, P.S. and Gupta, R. (2011) An Empirical Analysis of Stock Market Performance and Economic Growth: Evidence from India. International Research Journal of Finance and Economics, No. 73, 133-149.
[11] Boye, P. and Agyarko, K. (2020) Nonstationary Data Prediction Model Using Grey Time Series Method. Ghana Journal of Technology, 4, 16-25.
[12] Ziggah, Y.Y., Youjian, H., Yu, X. and Basomm, L.P. (2016) Capability of Artificial Neural Network for Forward Conversion of Geodetic Coordinates (φ, λ, h) to Cartesian Coordinates (X, Y, Z). Mathematical Geosciences, 48, 687-721.
[13] Olukayode, M.E. and Akinwande, A.A. (2010) Determinants of Stock Market Performance in Nigeria: Long-Run Analysis. Journal of Management and Organizational Behaviour, 1, 1-16.
[14] Sahoo, P.K. and Charlapally, K. (2015) Stock Price Prediction Using Regression Analysis. International Journal of Science and Engineering Research, 6, 1655-1659.
[15] Sharma, M. (2014) Survey on Stock Market Prediction and Performance Analysis. International Journal of Advanced Research in Computer Engineering and Technology, 3, 131-135.
[16] Saleh, M., Jahur, S.M., Nasrul, Q. and Aktaruzzaman, M.K. (2014) Determinants of Stock Market Performance in Bangladesh. Indonesian Management and Accounting Research, 13, 16-28.
[17] Dinh, S.T., Thi Mai, B.H. and Van, N.B. (2017) Determinants of Stock Market Development: The Case of Developing Countries and Vietnam. Journal of Economic Development, 24, 32-53.
[18] Mehmood, M.S., Mehmood, S. and Mujtaba, B.G. (2012) Stock Market Prices Follow the Random Walks: Evidence from the Efficiency of Karachi Stock Exchange. European Journal of Economics, Finance and Administrative Sciences, 51, 71-79.
[19] Vlasenko, A., Vlasenko, N., Vynokurova, O., Bodyanskiy, Y. and Peleshko, D. (2019) A Novel Ensemble Neuro-Fuzzy Model for Financial Time Series Forecasting. MDPI, 4, 1-11.
[20] Shahbaz, M., Ahmed, N. and Ali, L. (2008) Stock Market Development and Economic Growth: ARDL Causality in Pakistan, International Research. Journal of Finance and Economics, 14, 182-195.
[21] Devi, B.U., Sundar, D. and Alli, P. (2013) An Effective Time Series Analysis for Stock Trend Prediction Using ARIMA Model for Nifty Midcap-50. International Journal of Data Mining and Knowledge Management Process, 3, 65.
[22] Gay, R.D. (2016) Effect of Macroeconomic Variables on Stock Market Returns for Four Emerging Economies: Brazil, Russia, India and China. International Business and Economics Research Journal, 15, 119-126.
[23] Brahmasrene, T. and Jiranyakul, K. (2007) Cointegration and Causality between Stock Index and Macroeconomic Variables in an Emerging Market. Academy of Accounting and Financial Studies Journal, 11, 17-30.
[24] Mukherjee, D. (2007) Comparative Analysis of Indian Stock Market with International Markets. Great Lakes Institute of Management, Chennai, 1, 39-71.
[25] Ariyo, A.A., Adewumi, O.A. and Ayo, C.K. (2014) Stock Price Prediction Using the ARIMA Model. UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, Cambridge, UK, 105-111.
[26] Shumway, R.H. and Stoffer, D.S. (2011) Time Series Analysis and Its Applications. 3rd Edition, Springer, New York, 1-162.
[27] Enders, W. (2015) Applied Econometric Time Series, 4th Edition, Wiley, Hoboken, NJ, 7-70.
[28] Boye, P., Mireku-Gyimah, D. and Sadiq, A. (2019) Time Series Analysis Model for Estimating Housing Unit Price. Ghana Journal of Technology, 3, 35-41.
[29] Gebhard, K., Jürgen, W. and Uwe, H. (2013) Introduction to Modern Time Series Analysis. 2nd Edition, Springer, Heidelberg, 10-89.
[30] Hyndman, R.J. and Anthanasopoulos, G. (2018) Forecasting Principles and Practice. 2nd Edition, Springer, Berlin, 1-504.
[31] Abdallah, F.D.M. (2019) Role of Time Series Analysis in Forecasting Egg Production Depending on ARIMA Model. Journal of Applied Mathematics, 9, 1-5.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.