Forecasting Volatility Based on a New Combined HAR-Type Model with Long Memory and Switching Regime: Empirical Evidence from Equity Realized Volatility

Abstract

This paper proposes a new combined model accounting for short memory, long memory, heterogeneity, and switching regime to model realized volatility and forecast future volatility. We apply daily realized volatility series of SPX to estimate volatility model parameters of in-sample and full-sample, and forecast future daily out-of-sample volatility. The model estimated results show the significant impact of long memory, switching regime, heterogeneity and jump component. The results of out-of-sample volatility forecast evaluation indicate that MS-LM-HAR outperforms the other fifteen models based on the evaluating method of loss function and MCS. Our findings suggest that incorporating the property of long memory and switching regime into HAR-type models can significantly increase the forecast performance of realized volatility models.

Share and Cite:

Huang, Y. , Wan, Z. , Li, H. and Luo, Y. (2024) Forecasting Volatility Based on a New Combined HAR-Type Model with Long Memory and Switching Regime: Empirical Evidence from Equity Realized Volatility. Journal of Mathematical Finance, 14, 103-123. doi: 10.4236/jmf.2024.141005.

1. Introduction

The essential feature of financial markets, such as S & P 500 index, is volatile, as shown in Figure 1. Volatility in financial market as a measurement of total risk plays a vital role in derivatives pricing, asset portfolio and allocation, quantitative

Figure 1. Daily price and returns of the S & P 500 index.

investing strategies and risk assessment. Furthermore, volatility forecasting is crucial to its applications in quantitative investment and risk assessment. As computer technology develops, it is possible that high frequency trading data such as intra close price is available. Proposed by [1] , realized volatility (RV) calculated by intra returns such as five minutes returns are widely used as a measuring proxy of daily volatility. With calculated realized volatility, we can directly build econometrical volatility models and further forecast future volatility. During past decades, lots of RV-based econometrical model has been developed. The earliest type of simple econometric models is the autoregressive (AR) model, being able to capture short memory in volatility dynamics. Although it is not difficult to estimate the AR model and forecast future values by using code packages, few studies directly use it to forecast volatility. In fact, an ARMA-type model based on RV may perform well in forecast exercises [2] .

To improve forecast accuracy, some current literatures have dedicated to extend the AR model to capture the important properties in volatility. The earliest extension is to incorporate the property of long memory by fractional differencing in AR model, such as the AutoRegressive Fractionally Integration Moving Average (ARFIMA) model [3] , which is a popular model to capture long memory of financial time series. Early simulating results examine the superiority of ARFIMA model in forecasting long memory time series compared with ARMA models [4] . The empirical results in [5] suggest that the AR model can provide better forecasting performance than a fractional integrated process. An empirical study in [6] shows that ARFIMA model with exogenous variables (ARFIMAX) has better forecast accuracy of realized volatility. The work of [7] shows that the methods based on fractional integration are superior to alternatives not accounting for long memory. Therefore, there are no consistent conclusions about which model is better.

Long memory behavior observed in the time series may be real long memory described by the fractional integration process, or spurious long memory which may be induced by the property of switching regime in volatility. There is still no consistent conclusion whether it can be efficiently distinguished between the ARFIMA model which can capture long memory and the Markov switching (MS) model which can capture switching regime [8] . As an alternative to ARFIMA model, it is an appropriate way to model volatility by making the parameter of AR model change with transitional probability or by combining fractional integration and switching regime. The Markov switching AR model (MS-AR) is an enhanced AR model with time-varying parameters. Some empirical results show that MS-AR model outperforms ARFIMA-based models in a forecast evaluation [5] .

Alternatively, [9] proposed the Heterogeneous AutoRegressive (HAR) model to accommodate multiscaling dynamic and long memory. In fact, the standard HAR model has a simple autoregressive structure for RV with economical meaningful fixed lagged average RVs (that is, 1, 5, 22, to represent daily, weekly, and monthly). It is strongly possible to make extensions from the AR model to improve the ability of volatility forecasting. Some empirical results show that the HAR-type model display advantage in forecast compared with the fractional integrated (FI) model, the fractional integrated generalized autoregressive conditional heteroscedastic (FIGARCH) model and fractional stochastic volatility (FSV) model for short horizons [10] .

In recent years, the extensions of HAR-type with regime switching (MS-HAR) have been explored in current literature [11] [12] [13] [14] . Most of these studies indicate that MS-HAR type model is significantly better than the benchmark HAR model. Moreover, another different extension incorporating the exogenous factors into the prevailing HAR model, such as jump component [11] [15] , investor attention [13] , and trade tensions [16] , can also achieve superior forecasting performance under low volatility level.

Although HAR type model accounts for long memory to some extent, it is not enough to capture long memory in volatility. The presence of the long memory parameter is often important in addition to the HAR models [17] . According to the simulated results of [17] , it is necessary to consider long memory and HAR term in volatility models to capture volatility dynamic structure, and the combination of HAR and ARFIMA model (LM-HAR) is a good approximation to describe the property of long memory.

Until now, many studies have empirically compared the forecasting performance among volatility models with the properties of short memory, long memory, heterogeneity and switching regime, respectively. However, few study combine these properties to construct a hybrid realized volatility model, and to examine its estimating and forecasting performance. There is still no uniform conclusion whether combination of long memory, heterogeneity and switching regime in a framework can improve the out-of-sample forecasting ability. Based on current literature, we wonder whether it is necessary and valuable to incorporate these properties in the modelling and forecast of realized volatility. This paper will explore this topic and provide corresponding empirical results. On the one hand, we further confirm the current findings, such as that accounting for long memory can significantly improve the volatility performance [7] . On the other hand, we provide different findings, such as that there is no significant improvement for induction of jump component, which is different from the conclusion in [15] . Moreover, this paper gets the new empirical findings of the new combined model which is still not explored in current literature.

We firstly step by step incorporates the long memory, heterogeneity and switching regime into simple AR model to construct LM-AR, HAR and MS-AR. Secondly, we combine these properties with each other to construct new LM-HAR, MS-LM-AR and MS-HAR models. Finally, three properties are incorporated into AR model to construct a new hybrid model called MS-LM-HAR. Additionally, the daily lagged jump component is added into all the above models to test whether jump has significantly impact on volatility and the robust results of out-of-sample volatility forecast performance improvement. The constructed AR-based models are shown in Figure 2.

To evaluate the several models’ performance, we use the daily realized volatility series of S & P 500 stock index (SPX) to estimate sixteen models and adopt the recursive method to obtain the out-of-sample forecast for volatility. The main findings are as follows. We firstly find that the estimated model parameters are significant, which means that the properties of long memory, Markov switching and heterogeneity may impact the the realized volatility process. And moreover, according to the evaluating results of out-of-sample volatility forecast, the MS-LM-HAR model performs best compared with other fifteen models. Secondly, although jump component may have significant impact on the realized volatility process, there is almost no further improvement for forecasting future volatility.

The academic contributions of this paper are threefold. Firstly, this is the first

Figure 2. AR-based volatility models in this paper.

study to accommodate the long memory and switching regime in HAR framework and propose a new combination MS-LM-HAR model. Secondly, we demonstrate the forecast superiority of MS-LM-HAR over MS-HAR, LM-HAR and HAR model from empirical evidence of S & P 500 realized volatility. Thirdly, this paper demonstrates that the introducing daily jump component into HAR-type models cannot improve the out-of-sample volatility forecast accuracy. Finally, this paper provides the new findings about the significance of importance components in combined realized volatility model and the volatility forecasting performance.

The structure of our paper includes four sections. Section 2 provides the individual and combined specifications of econometric models for realized volatility including AR, LM-AR, MS-AR, MS-LM-AR, HAR, LM-HAR, MS-HAR, MS-LM-HAR, AR-J, LM-AR-J, MS-AR-J, MS-LM-AR-J, HAR-J, LM-HAR-J, MS-HAR-J and MS-LM-HAR-J. Section 3 reports the empirical findings including the data sample, the results of descriptive analysis, the in-sample and full-sample estimation results, and the out-of-sample forecasting performance for sixteen models. Loss function and MCS method are used to evaluate the forecast performance. The final section presents conclusions and some remarks.

2. Specifications of New Combined Volatility Models

2.1. The Original HAR Model

With intraday high-frequency data which is available under the fast development of computer technology, it is feasible to observe the daily volatility via several proxies of the realized volatility, including the realized variance [1] , the realized bi-power variation [18] , the realized range-based volatility [19] , and the median-based volatility [20] . The realized variance (hereafter RV) can be used to measure the volatility at specific term with high frequency trading data. Specifically, the squared intraday returns are summed to calculate realized daily volatility:

R V t = j = 1 M r t , j 2 (1)

where R V t denotes realized variance at day t, r t , j = p t , j p t , j 1 denotes the j-th intraday return based on log-prices on day t, M = [ 1 Δ ] denotes integer part of 1 Δ , Δ denotes sampling frequency.

Based on heterogeneous market hypothesis (HMH), [9] introduced the heterogeneous autoregressive model to model realized volatility, which is called HAR model. The specification of HAR model is expressed as follows,

R V t + h = ϕ 0 + ϕ D R V t + ϕ W R V ¯ t W + ϕ M R V ¯ t M + ε t + h (2)

where R V t + h denotes the RV series at day t + h, R V t denotes the historical value of past daily RV at day t, R V ¯ t W = 1 5 i = 0 4 R V t i denotes the average value of past weekly RV series between day t − 4 and day t, R V ¯ t M = 1 22 i = 0 21 R V t i denotes the average value of past monthly RV series between day t − 21 to day t, ε t + h is the disturbance term. h = 1, 5, and 22 corresponds to the cumulative volatilities of 1 day, 5 days, and 22 days.

When the weekly RV term and monthly RV term are deleted from the HAR model, the HAR model is simplified to the AR model with the lag order h, that is,

R V t + h = μ + ϕ D R V t + ε t + h (3)

We introduce the jump component to HAR model as

R V t + h = μ + ϕ D R V t + ϕ J J D , t + ε t + h (4)

where J D , t is the daily jump component of realized volatility at day t, which is calculated as

J D , t = R V t B P V t (5)

where R V t is the realized variance at day t, B P V t is the bipower variation at day t, which can be estimated by the following realized proxy [18] :

R B V t = j = 1 M 1 | r t , j | | r t , j + 1 | (6)

2.2. The Long-Memory HAR (LM-HAR) Model

The HAR model is a simpler model than ARFIMA model, and can be estimated through OLS method. To be suitable to more general situations, some versions of the basic HAR model have been extended in the literatures, for example, jumps [21] [22] , leverage effect [23] , implied volatility [24] , unit root [25] . The HAR model has been shown to have more accurate forecast in out-of-sample period than ARMA and ARFIMA model [9] [25] .

Some empirical results demonstrate that there is significant long memory in realized volatility series [26] . Therefore, the ARFIMA model is widely used to describe long memory for realized volatility by using a fractional difference operator [27] [28] [29] . The specification of the LM-AR(p) model for realized volatility is expressed as the following specification:

Φ ( L ) ( 1 L ) d R V t = μ + ε t + p (7)

where R V t denotes the daily realized volatility, t = 1 , , T , d ( 0 , 1 ) denotes the fractional differencing parameter, Φ ( L ) = 1 ϕ 1 L ϕ p L p is the autoregressive lag operator polynomial of order p, μ is a constant, ε t is the white noise, ε t ~ i .i .d . ( 0 , σ ε 2 ) , σ ε 2 < , ( 1 L ) d denotes a fractional difference operator, for any real number d, which can be expanded via binomial expansion based on Gamma function:

( 1 L ) d = 1 d L + d ( d 1 ) 2 ! L 2 d ( d 1 ) ( d 2 ) 3 ! L 3 = k = 0 ( d k ) ( 1 ) k L k = k = 0 Γ ( k d ) Γ ( k + 1 ) Γ ( d ) L k (8)

where Γ ( ) is the Gamma function, Γ ( k + a ) Γ ( k + b ) k a b , Γ ( α ) = 0 t α 1 e t d t , λ k = ( d k ) ( 1 ) k .

The specification of the LM-HAR model for realized volatility is expressed as the following specification:

Φ ( L ) ( 1 L ) d R V t + h = μ + ϕ W R V ¯ t W + ϕ M R V ¯ t M + ε t (9)

where R V t denotes the daily realized volatility, t = 1 , , T , d ( 0 , 1 ) denotes the fractional differencing parameter, Φ ( L ) = 1 ϕ 1 L ϕ p L p is the autoregressive lag operator polynomial of order p, μ is a constant, ε t is the white noise, ε t ~ i .i .d . ( 0 , σ ε 2 ) , σ ε 2 < . The combined LM-HAR model shown in formula (9) is constructed based on the incorporation of the long memory denoted by a fractional difference operator ( 1 L ) d into the formula (2). When Φ ( L ) = 1 ϕ D L h and d = 0, then the LM-HAR model is simplified as the HAR model. When Φ ( L ) = 1 ϕ D L h and d = 0, then the LM-HAR model is simplified as the HAR model shown in Model (2). When the weekly RV term and monthly RV term are deleted from the LM-HAR model, the model is simplified to the LM-AR model shown in Model (7).

2.3. The Markov Switching HAR (MS-HAR) Model

On the estimation view, the Markov switching (hereafter MS) model should behave more like a short memory process. However, as shown by the previous studies, the MS model seemingly also induces to long memory observed in the many fields. Commonly, spectral estimators such as local Whittle type may view the MS process as long memory [30] . Regime switching dynamics in the MS model are driven by state variable, which has the general assumption of being stationary Markov Chain with a matrix of transition probability. Thus, the MS model may happen to have some important features, such as long memory. For empirical perspective, the MS model has been applied to quantitatively describe the different dynamic behavior under each state, such as the bubble series of crude oil price [31] .

The form of the autoregressive model with Markov switching regime (hereafter MS-AR) can be specified as follows:

y t = μ S t + k = 1 p φ k , S t y t k + σ S t ε t (10)

where y t is the original series at time t, S t denotes the state variable at time t for y t . At regime S t and time t, μ S t denotes the intercept, φ k , S t denotes autoregressive coefficient, and σ S t denotes standard deviation. ε t denotes the residual series with ε t ~ IIDN ( 0 , 1 ) .

This paper set the number of switching state as 2, that is, S t = 1 for the first state, otherwise, S t = 2 for the second state. For the original series { y t } t = 1 T , we can obtain a state series { S t } t = 1 T , which can be assumed to be a stationary and irreducible Markov process. Let p i j = P { S t + 1 = j | S t = i } denotes a switching probability from state i at time t to state j at time t + 1, where i, j = 1, 2. Then, the state transition probability matrix for series { y t } t = 1 T can be expressed as:

P = [ p 11 p 12 p 21 p 22 ] (11)

Therefore, the intercept, autoregressive coefficient and variance are switching states with respect to an indicator variable. In detailed, μ 1 and μ 2 denote the mean of state 1 and 2, respectively, φ k , 1 and φ k , 2 denotes the autoregressive coefficient for state 1 and 2, respectively, and σ 1 2 and σ 2 2 denote the variance for state 1 and 2, respectively.

For realized volatility in our study, when autoregressive coefficient is Markov switching with two switching states and one lag order, the specification of MS-HAR model is expressed as follows,

R V t + h = ϕ 0 + ϕ D , S t R V t + ϕ W R V ¯ t W + ϕ M R V ¯ t M + ε t + h (12)

where S t denotes the state variable at time t for R V t . When the weekly RV term and monthly RV term are deleted from the MS-HAR model, the model is simplified to the MS-AR model with the following specification:

R V t + h = ϕ 0 + ϕ D , S t R V t + ε t + h (13)

2.4. The Markov Regime Long Memory HAR (MS-LM-HAR) Model

Inspired by existing literature, we propose an Markov switching long memory HAR (MS-LM-HAR) model to capture long memory and switching regime in realized volatility. To capture the property of long memory in realized volatility, we integrate a fractional difference operator ( 1 L ) d into the MS-HAR model shown in Model (12). The corresponding specification of MS-LM-HAR model can be expressed as:

( 1 ϕ D , S t L h ) ( 1 L ) d R V t + h = μ + ϕ W R V ¯ t W + ϕ M R V ¯ t M + ε t + h (14)

where ϕ D , S t denotes the autoregressive coefficient of R V t , which is Markov switching with two switching states. When the weekly RV term and monthly RV term are canceled from the MS-LM-HAR model, the model is simplified to the MS-LM-AR model with the following specification:

( 1 ϕ D , S t L h ) ( 1 L ) d R V t + h = μ + ε t + h (15)

In a word, the combined model called MS-LM-HAR shown in (14) can be constructed based on the following steps. The first step is to use realized variance RV to build the standard HAR model shown in (2), the second step is to add a fractional difference operator ( 1 L ) d in HAR model to formulate the LM-HAR model shown in (9), and finally the third step is to make the autoregressive coefficient of R V t be Markov switching with two switching states to formulate the combined MS-LM-HAR model shown in (14).

The proposed new combined model could synchronously capture the important properties of long memory, heterogeneity and switching regime, which can make up for some shortcoming of the traditional AR model, and improve the forecast performance of realized volatility.

3. Empirical Results

3.1. Data Sample and Statistical Analysis

The 5-minute close price of the Standard & Poor’s 500 index (hereafter SPX) is chosen as data sample. The daily close price and realized variance calculated by the squared sum of 5-minute close return calculated from the Thomson Reuters DataScope Tick History database is downloaded from the Oxford-Man Institute (its website is https://oxford-man.ox.ac.uk/). To calculate the SPX daily realized volatility, the 5-minute intra return is first calculated by the following definition of log return as follows,

r t = log ( P t / P t 1 ) (16)

where P t denotes close price at day t, and log(∙) is the logarithm function.

Adopting the methodology of [1] , the SPX daily realized volatility used in this paper is taken as the daily volatility proxy calculated as follows,

R V t = j = 1 M r t , j 2 (17)

where r t , j denotes the j-th close return which is defined as Equation (16), M denotes the number of 5-minute close return on the day t. To avoid the impact of COVD-2019 on the volatility process, the period of the SPX daily realized volatility sample ranges from January 05, 2000 to December 31, 2019.

The full sample of daily return and realized volatility for SPX is shown in the left and right of Figure 3, respectively. The results of statistical analysis for daily

Figure 3. Daily realized volatility of S & P 500 index.

return and realized volatility for SPX in all sample periods are provided in Table 1, respectively. It is shown from the results of skewness, kurtosis, JB, KS and stable index α of stable distribution that all the daily return and volatility series are characterized by leptokurtic distribution. All the significant test results of Ljung-Box (LB) and ADF statistic indicate that the daily return and volatility experience short memory and they are not unit root process, however, from the estimate results of fractional differencing parameter (d) and Hurst exponent (H), the daily return doesn’t have long memory, but the daily volatility has significant long memory and are most likely nonstationary. Therefore, all of volatility series appear the important properties, including long memory, time-varying, right skewed, fat tail, and short memory.

3.2. Estimated Results of Volatility Models

The in-sample and full-sample of realized volatility series are separately used to estimate the following 16 realized volatility models: AR, LM-AR, MS-AR, MS-LM-AR, HAR, LM-HAR, MS-HAR, MS-LM-HAR, AR-J, LM-AR-J, MS-AR-J,

Table 1. Statistical analysis of return and realized volatility for SPX.

Notes: Full-sample is from January 1, 2000 to December 31, 2019, in-sample is January 1, 2000 to December 31, 2018, out-of-sample is from January 1, 2019 to December 31, 2019. JB, KS, ADF and Q(m) denots the Jarque Bera, Kolmogorov-Smirnov, Augument Dickey-Fuller and Ljung Box statistic at the order m. d denotes the fractional differencing order estimated by ELW [32] , H is the Hurst index estimated by R/S_Com [26] , α is the stable parameter of stable distribution. *, ** and *** show significant at nominal level 10%, 5% and 1% for JB, KS, Q (5), Q (10), and ADF, respectively. Only the estimate results are shown for N, Min, Max, μ, σ, Skewness, Kurtosis, H, d and α.

MS-LM-AR-J, HAR-J, LM-HAR-J, MS-HAR-J and MS-LM-HAR-J. There is new challenge of parameter estimation for the combined model. The Quasi maximum likelihood method which assumes that innovation distribution is normal distribution is used to estimate the model parameters. Table 2 and Table 3

Table 2. In-sample estimated results of HAR-type models for SPX daily realized volatility.

Notes: The results are for RV5 * 100 defined in Eq. (17) estimated and calculated by TSM 4.50 and Python. The order of AR part is p = 1. We use the Quasi maximum likelihood method to estimate model parameters. *, ** and *** show significant at level 10%, 5% and 1% except for LLF and P11/P22, respectively. The standard error of the estimated parameters is presented in the parenthesis.

Table 3. Full-sample estimated results of HAR-type models for SPX daily realized volatility.

provided the model estimated results for in-sample and full-sample, respectively.

From the results, we can get the following empirical findings. Firstly, the realized volatility series experience distinct long memory. All the test results whether fractional differencing order is not significantly identical to zero are significant at the level 1%, and the minimum and maximum values are 0.2696 and 0.7501, respectively. Moreover, the estimated value for the long memory models without Markov switching regime is bigger than the one for the same specification of long memory models with Markov switching. For example, the estimated fractional differencing order for LM-AR is 0.6107 and 0.6106 for in-sample and full-sample, respectively, but the corresponding estimate for MS-LM-AR is 0.6031 and 0.5822 for in-sample and full-sample, respectively. This indicates that considering the Markov switching regime in the long memory model could reduce the extent of long memory, but the estimated results are still significant.

Secondly, there is distinct switching regime in realized volatility series. From the estimated results, we can see that all the autoregressive coefficients are significant. For example, the estimated values of ϕ D in MS-HAR model are 0.2614 and 3.6831 for state 1 and 2, respectively. The similar results are shown from estimated results for full-sample. Therefore, the property of switching regime should be considered in the realized volatility model.

Secondly, the estimated coefficients for weekly and monthly term in HAR, HAR-J, MS-HAR and MS-HAR-J are almost significant. This further demonstrates that realized volatility series have distinct long memory. Furthermore, introducing long memory into HAR model makes the regression coefficients change for weekly and monthly term. For example, ϕ W = 0.4093 and ϕ M = 0.2262 for HAR model are significant, but ϕ 5 = 0.1027 and ϕ 22 = 0.0751 for LM-HAR model. The results indicate that there is common information between heterogeneity and long memory for volatility series.

Thirdly, at level 1%, the impact of jump on realized volatility is significantly negative. These findings are in agreement with those in current literature, for example [33] . For in-sample, the coefficients of daily jump in the models with jump components including AR-J, LM-AR-J, MS-AR-J, MS-LM-AR-J, HAR-J, LM-HAR-J, MS-HAR-J and MS-LM-HAR-J are −0.9500, −0.6443, −0.6177, −0.3872, −0.7430, −0.7669, −0.4909 and −0.5414, respectively. Similarly, all the coefficients of daily jump for full-sample are also negative. The negative coefficient suggests that past daily negative jumps can result in higher future volatility than positive jumps, which is called the leverage effect of jumps.

3.3. Loss Function Comparison Results of Out-of-Sample Volatility Forecast

Although the in-sample estimation of a model can provide the useful information for describing the relationship between variables, we face with the problems that the model which is in-sample overfitting has bad forecast performance. Therefore, in contrast to the in-sample estimation, as a more efficient way to evaluate the model performance, the out-of-sample forecast concerns about the ability of models to forecast future volatility and should be paid more attention to in the applications. This section provides the evaluating results of volatility forecast performance in out-of-sample.

To get volatility forecast in out-of-sample, we consider the recursive method to obtain the future forecast. In details, we select the period from January 1, 2000 to December 31, 2018 as the first sample for estimating, and get the first forecast for four steps of 1, 5, 10, 22 ahead. Then, the model contained in the same settings is re-estimated in the second estimating sample which is expanded by add the first observation in the out-of-sample, and the second forecast is obtained. The model estimation and forecast are repeated until all the forecast for out-of-sample is obtained. The number of out-of-sample volatility forecast in this paper is totally 249.

After all the forecasts have been obtained, loss function can be calculated to directly assess the performance of out-of-sample volatility forecast for the competing models. According to [34] and [35] , we choose popular robust loss functions, that is, the mean squared error (hereafter MSE) and the mean absolute error (hereafter MAE), which are respectively defined as,

MSE = 1 n t = 1 n ( σ t 2 σ ^ t 2 ) 2 (18)

MAE = 1 n t = 1 n | σ t 2 σ ^ t 2 | (19)

where n is the number of forecasting volatilities, σ ^ t 2 denotes the variance forecast at day t, and σ t 2 denotes the true variance at day t or conditionally unbiased volatility proxy. In fact, since the real latent volatility is not observed, it is difficult to evaluate the out-of-sample volatility forecast performance of volatility models. A practical solution is to substitute the true variance with a volatility proxy, and evaluate the models by comparing its forecast volatility series to the volatility proxy series.

Table 4 gives the calculated results of loss function MSE and MAE of out-of-sample volatility forecast for 16 models. The evaluating results verify that long memory models, especially LM-AR, significantly outperform short memory model and Markov switching model. Moreover, the introduction of daily jump into the models cannot improve the out-of-sample forecast accuracy, and a combination of long memory, Markov switching and daily jump doesn’t make out-of-sample forecast better. It is possibly caused that there is common information or interactive impact among them. Some empirical results show that long memory or structural break (jump) could be induced by each other. Long memory induced by the short memory process contaminated by structural break or jump is spurious long memory, and structural break induced by long memory is spurious structural break.

3.4. MCS Test Results of Out-of-Sample Volatility Forecast

According to previous studies, the evaluated results based on loss functions may be not robust. It cannot be concluded about the forecast performance of volatility models only using a single loss function. Therefore, it is necessary to re-evaluate the performance by using other methods. Several evaluation procedures have been proposed for this purpose, such the MSFE t-statistic introduced in [36] and [37] , the superior predictive ability (hereafter SPA) proposed by [34] , and the model confidence set (hereafter MCS) constructed by [38] . Since the

Table 4. Loss function calculated results of out-of-sample forecast for SPX daily realized volatility.

MCS method recently developed is an attractive and efficient one among the current methods, this paper chooses to use this method.

As a newest method, MCS test is utilized to compare several volatility models and select the best model set from the given initial model set according to a given optimality criterion without requiring a benchmark to be specified. The MCS procedure comprises three steps:

Step 1, set M = M0, where M0 denotes the initial model set including all the specifications of competing models.

Step 2, at a significant level α , this paper use the MCS statistic to examine the null hypothesis that all the models have equal predictive ability (EPA), that is, H 0 , N : E [ d k l , t ] = 0 . The statistic is defined as

T M C S = max k , l N | d ¯ k V a r ^ ( d ¯ k ) | (20)

where d ¯ k = 1 n 1 l N d ¯ k l denotes the sample loss function of model k relative to the average losses for all models in the model set, l = 1 , 2 , , N , d ¯ k l = n 1 t = 1 n d k l , t is the average of the relative loss between model k and l, d k l , t L k , t L l , t is the differential of loss function at time t between model k and l, V a r ^ ( d ¯ k ) is the bootstrapped estimate of variance of d ¯ k .

Step 3, set the superior model set (hereafter SSM) as M ^ 1 α * = M when the test doesn’t reject the null hypothesis, or the worst model is removed from the model set M, and step 2 is repeated. Finally, this step can obtain the model confidence set M ^ 1 α * .

The results of MCS test of 16 volatility models are presented in Table 5. For MSE, at given significant level α = 0.20 , the model confidence set encompasses the following models: MS-LM-AR, MS-HAR, MS-LM-HAR, MS-LM-AR-J for 1 step ahead forecast, HAR and MS-HAR for 5 step, MS-LM-AR, HAR, MS-HAR, and MS-LM-HAR for 10 step, and MS-LM-HAR, MS-LM-HAR-J for 22 step, respectively. Similarly, for MAE, at given significant level α = 0.20 , the model confidence set encompasses the following models: MS-AR, MS-LM-AR, MS-HAR, MS-LM-HAR, MS-AR-J, MS-LM-AR-J, and MS-LM-HAR-J for 1 step, MS-HAR and MS-LM-HAR for 5 step, HAR, MS-HAR, MS-LM-HAR for 10 step, and MS-LM-HAR and MS-LM-HAR-J for 22 step, respectively.

According to the MCS test results, we can get four significant findings as follows. Firstly, it is verified that the LM-AR or HAR or MS-AR model is better than AR. MS-AR-J is better than AR-J, LM-AR-J or HAR-J is almost as AR-J. The AR-based models with long memory or switching regime consistently

Table 5. MCS test results of volatility models for S & P500 index

outperform the ones without the corresponding property.

Secondly, MS-HAR is better than HAR, and MS-HAR-J is better than HAR-J. But LM-HAR-J is almost as HAR-J, LM-HAR model is not better than HAR. It is obvious that combination switching regime with HAR model can improve forecast performance, but tt is cautious to combine LM-AR and HAR model into LM-HAR, which can be still further researched.

Thirdly, more important, according to the frequency of encompassing models, MS-LM-HAR model has 3 times for MSE and 4 times for MAE, with total 7 times which is maximum frequency compared with other models. Therefore, MS-LM-HAR model is the best model amongst sixteen volatility models, followed by MS-HAR model.

Finally, AR-based models with jump component are not better than the ones without jump. In details, AR-J is almost as same as AR model, HAR-J is almost as HAR, and MS-LM-HAR-J is not better than MS-LM-HAR. Considering previous daily jump doesn’t improve the accuracy of out-of-sample forecast although its estimated coefficient shown in Table 2 and Table 3 is significant.

Therefore, we can draw the conclusion from above findings that combination long memory, heterogeneity and regime switching with AR model can significantly improve volatility forecast performance. It is necessary to incorporate the property of long memory and switching regime into HAR-type model to model and forecast volatility, but not necessary for lag daily jump component.

The MS-LM-HAR model yields higher forecasting accuracy then the other models, suggesting that accounting for the long memory, switching regime and heterogeneity can significantly improve the out-of-sample forecasting accuracy of realized stock volatility.

Figure 4. The daily realized volatility out-of-sample forecast of S & P 500 index based on the combined MS-LM-HAR model.

4. Conclusions and Remarks

This paper creatively proposes a new combined model of short memory, long memory, heterogeneity, switching regime and jump, called MS-LM-HAR-J, to describe the stylized facts in volatility series. By using realized volatility of SPX, we divide the full-sample (i.e., from January 1, 2000 to December 31, 2019) into in-sample (i.e., from January 1, 2000 to December 31, 2018) and out-of-sample (i.e., from January 1, 2019 to December 31, 2019). The Quasi maximum likelihood estimator is utilized to estimate the sixteen models’ parameters in the estimating sample moving forward by one observation, and forecast the out-of-sample volatility by 1, 5, 10, 22 steps ahead, respectively.

The estimated results of in-sample indicate that the impact of long memory, jump component and switching regime on the realized volatility is significant. The evaluating results of volatility forecast in out-of-sample demonstrate that the property of long memory considered in LM-AR, HAR and LM-HAR model can significantly improve the forecast performance, but may not for switching regime and jump component. Moreover, LM-AR is the best model from the in-sample estimated results and the out-of-sample evaluated results.

However, whether we should combine the LM-AR and HAR model into LM-HAR is still further researched by using more samples and other proxies of realized volatility.

Totally, LM-AR is the best model from the results of in-sample estimation, loss function comparison, and MCS evaluation.

The findings of this paper may benefit portfolio investors, risk managers and policy makers for minimizing risks, maximizing returns, and stabilizing the markets. Our findings have important implications for financial investors and market policymakers.

However, it is not clearly conclusive results whether the combination of long memory in LM-AR model and heterogeneity in HAR model should be considered for volatility forecast. Certainly, some topics are worthy to further research, such as robust checks by more samples and other proxies of realized volatility, the extensive case with GARCH term and more factors. Moreover, the multivariate long memory and switching regime models are an excellent future direction for academic research in this field.

Acknowledgments

This work was supported by the research grants from the Natural Science Foundation of Guangdong Province (No. 2022A1515011478) and the Humanities and Social Science Foundation of Ministry of Education of China (Research on the hybrid forecast model of financial volatility based on big data and deep learning: theory and application, No. 22YJAZH032).

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Andersen, T.G. and Bollerslev, T. (1998) Answering the Skeptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts. International Economic Review, 39, 885-905. https://doi.org/10.2307/2527343
[2] Andersen, T.G., Bollerslev, T. and Meddahi, N. (2004) Analytical Evaluation of Volatility Forecasts. International Economic Review, 45, 1079-1110.
https://doi.org/10.1111/j.0020-6598.2004.00298.x
[3] Hosking, J.R.M. (1981) Fractional Differencing. Biometrika, 68, 165-176.
https://doi.org/10.2307/2335817
[4] Brodsky, J. and Hurvich, C.M. (1999) Multi-Step Forecasting for Long-Memory Processes. Journal of Forecasting, 18, 59-75.
https://doi.org/10.1002/(SICI)1099-131X(199901)18:1<59::AID-FOR711>3.0.CO;2-V
[5] Wang, C.S.-H., Bauwens, L. and Hsiao, C. (2013) Forecasting a Long Memory Process Subject to Structural Breaks. Journal of Econometrics, 177, 171-184.
https://doi.org/10.1016/j.jeconom.2013.04.006
[6] Zhou, W., Pan, J. and Wu, X. (2019) Forecasting the Realized Volatility of CSI 300. Physica A: Statistical Mechanics and Its Applications, 531, Article ID: 121799.
https://doi.org/10.1016/j.physa.2019.121799
[7] Hassler, U. and Pohle, M.-O. (2023) Forecasting under Long Memory. Journal of Financial Econometrics, 21, 742-778. https://doi.org/10.1093/jjfinec/nbab017
[8] Shi, Y. (2015) Can We Distinguish Regime Switching from Long Memory? A Simulation Evidence. Applied Economics Letters, 22, 318-323.
https://doi.org/10.1080/13504851.2014.941526
[9] Corsi, F. (2009) A Simple Approximate Long-Memory Model of Realized Volatility. Journal of Financial Econometrics, 7, 174-196.
https://doi.org/10.1093/jjfinec/nbp001
[10] Alfeus, M. and Nikitopoulos, C.S. (2022) Forecasting Volatility in Commodity Markets with Long-Memory Models. Journal of Commodity Markets, 28, Article ID: 100248. https://doi.org/10.1016/j.jcomm.2022.100248
[11] Wang, J., Ma, F., Liang, C. and Chen, Z. (2022) Volatility Forecasting Revisited Using Markov-Switching with Time-Varying Probability Transition. International Journal of Finance & Economics, 27, 1387-1400. https://doi.org/10.1002/ijfe.2221
[12] Alizadeh, A.H., Huang, C.-Y. and Marsh, I.W. (2021) Modelling the Volatility of TOCOM Energy Futures: A Regime Switching Realised Volatility Approach. Energy Economics, 93, Article ID: 104434. https://doi.org/10.1016/j.eneco.2019.06.019
[13] Liu, Y., Niu, Z., Suleman, M.T., Yin, L. and Zhang, H. (2022) Forecasting the Volatility of Crude Oil Futures: The Role of Oil Investor Attention and Its Regime Switching Characteristics under a High-Frequency Framework. Energy, 238, Article ID: 121779. https://doi.org/10.1016/j.energy.2021.121779
[14] Luo, J., Klein, T., Ji, Q. and Hou, C. (2022) Forecasting Realized Volatility of Agricultural Commodity Futures with Infinite Hidden Markov HAR Models. International Journal of Forecasting, 38, 51-73.
https://doi.org/10.1016/j.ijforecast.2019.08.007
[15] Li, X. and Ma, X. (2023) Jumps and Gold Futures Volatility Prediction. Finance Research Letters, 58, Article ID: 104492. https://doi.org/10.1016/j.frl.2023.104492
[16] Bouri, E., Gkillas, K., Gupta, R. and Pierdzioch, C. (2021) Forecasting Realized Volatility of Bitcoin: The Role of the Trade War. Computational Economics, 57, 29-53.
https://doi.org/10.1007/s10614-020-10022-4
[17] Baillie, R.T., Calonaci, F., Cho, D. and Rho, S. (2019) Long Memory, Realized Volatility and Heterogeneous Autoregressive Models. Journal of Time Series Analysis, 40, 609-628. https://doi.org/10.1111/jtsa.12470
[18] Barndorff-Nielsen, O.E. and Shephard, N. (2004) Power and Bipower Variation with Stochastic Volatility and Jumps. Journal of Financial Econometrics, 2, 1-37. https://doi.org/10.1093/jjfinec/nbh001
[19] Christensen, K. and Podolskij, M. (2007) Realized Range-Based Estimation of Integrated Variance. Journal of Econometrics, 141, 323-349.
https://doi.org/10.1016/j.jeconom.2006.06.012
[20] Andersen, T.G., Dobrev, D. and Schaumburg, E. (2012) Jump-Robust Volatility Estimation Using Nearest Neighbor Truncation. Journal of Econometrics, 169, 75-93.
https://doi.org/10.1016/j.jeconom.2012.01.011
[21] Andersen, T.G., Bollerslev, T. and Diebold, F.X. (2007) Roughing It Up: Including Jump Components in the Measurement, Modeling, and Forecasting of Return Volatility. The Review of Economics and Statistics, 89, 701-720.
https://doi.org/10.1162/rest.89.4.701
[22] Corsi, F., Pirino, D. and Renò, R. (2010) Threshold Bipower Variation and the Impact of Jumps on Volatility Forecasting. Journal of Econometrics, 159, 276-288.
https://doi.org/10.1016/j.jeconom.2010.07.008
[23] McAleer, M. and Medeiros, M.C. (2008) Realized Volatility: A Review. Econometric Reviews, 27, 10-45. https://doi.org/10.1080/07474930701853509
[24] Busch, T., Christensen, B.J. and Nielsen, M.Ø. (2011) The Role of Implied Volatility in Forecasting Future Realized Volatility and Jumps in Foreign Exchange, Stock, and Bond Markets. Journal of Econometrics, 160, 48-57.
https://doi.org/10.1016/j.jeconom.2010.03.014
[25] Cho, S. and Shin, D.W. (2016) An Integrated Heteroscedastic Autoregressive Model for Forecasting Realized Volatilities. Journal of the Korean Statistical Society, 45, 371-380. https://doi.org/10.1016/j.jkss.2015.12.004
[26] Luo, Y. and Huang, Y. (2018) A New Combined Approach on Hurst Exponent Estimate and Its Applications in Realized Volatility. Physica A: Statistical Mechanics and Its Applications, 492, 1364-1372. https://doi.org/10.1016/j.physa.2017.11.063
[27] Raggi, D. and Bordignon, S. (2012) Long Memory and Nonlinearities in Realized Volatility: A Markov Switching Approach. Computational Statistics & Data Analysis, 56, 3730-3742. https://doi.org/10.1016/j.csda.2010.12.008
[28] Andersen, T.G., Bollerslev, T., Diebold, F.X. and Ebens, H. (2001) The Distribution of Realized Stock Return Volatility. Journal of Financial Economics, 61, 43-76.
https://doi.org/10.1016/S0304-405X(01)00055-1
[29] Giot, P. and Laurent, S. (2004) Modelling Daily Value-at-Risk Using Realized Volatility and ARCH Type Models. Journal of Empirical Finance, 11, 379-398.
https://doi.org/10.1016/j.jempfin.2003.04.003
[30] Diebold, F.X. and Inoue, A. (2001) Long Memory and Regime Switching. Journal of Econometrics, 105, 131-159. https://doi.org/10.1016/S0304-4076(01)00073-2
[31] Zhang, Y.-J. and Wang, J. (2015). Exploring the WTI Crude Oil Price Bubble Process Using the Markov Regime Switching Model. Physica A: Statistical Mechanics and Its Applications, 421, 377-387. https://doi.org/10.1016/j.physa.2014.11.051
[32] Shimotsu, K. and Phillips, P.C.B. (2005) Exact Local Whittle Estimation of Fractional Integration. The Annals of Statistics, 33, 1890-1933.
http://www.jstor.org/stable/3448627
[33] Wang, Y., Ma, F., Wei, Y. and Wu, C. (2016) Forecasting Realized Volatility in a Changing World: A Dynamic Model Averaging Approach. Journal of Banking & Finance, 64, 136-149. https://doi.org/10.1016/j.jbankfin.2015.12.010
[34] Hansen, P.R. and Lunde, A. (2005) A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1, 1)? Journal of Applied Econometrics, 20, 873-889.
https://doi.org/10.1002/jae.800
[35] Patton, A.J. (2011) Volatility Forecast Comparison Using Imperfect Volatility Proxies. Journal of Econometrics, 160, 246-256.
https://doi.org/10.1016/j.jeconom.2010.03.034
[36] Diebold, F.X. and Mariano, R.S. (1995) Comparing Predictive Accuracy. Journal of Business & Economic Statistics, 13, 253-263. https://doi.org/10.2307/1392185
[37] West, K.D. (1996) Asymptotic Inference about Predictive Ability. Econometrica, 64, 1067-1084. https://doi.org/10.2307/2171956
[38] Hansen, P.R., Lunde, A. and Nason, J.M. (2011) The Model Confidence Set. Econometrica, 79, 453-497. https://doi.org/10.3982/ECTA5771

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.