Forecasting Realized Volatility Using Subsample Averaging

When the observed price process is the true underlying price process plus microstructure noise, it is known that realized volatility (RV) estimates will be overwhelmed by the noise when the sampling frequency approaches infinity. Therefore, it may be optimal to sample less frequently, and averaging the less frequently sampled subsamples can improve estimation for quadratic variation. In this paper, we extend this idea to forecasting daily realized volatility. While subsample averaging has been proposed and used in estimating RV, this paper is the first that uses subsample averaging for forecasting RV. The subsample averaging method we examine incorporates the high frequency data in different levels of systematic sampling. It first pools the high frequency data into several subsamples, then generates forecasts from each subsample, and then combines these forecasts. We find that in daily S&P 500 return realized volatility forecasts, subsample averaging generates better forecasts than those using only one subsample.


Introduction
The rich dynamics in ultra-high-frequency financial data may be captured to improve estimation of quadratic variation (integrated variance) or forecasting volatility.There is a considerable amount of literature addressing the estimation issue, whereas little work has been done on forecasting.This paper contributes to the forecasting issue in the high-frequency data literature by examining whether it pays to incorporate the intraday data and, more importantly, how to incorporate the high-frequency information to achieve better performance for forecasting daily return volatility.Further, we seek to link the estimation and forecasting aspects by adopting and tailoring methods proposed for estimation of quadratic variation to forecasting.
The existing literature on high-frequency data, especially that devoted to the realized volatility issue, can be grouped roughly into two areas: estimating quadratic variation using realized volatility (RV) and forecasting volatility using RV.References [1,2] establish that RV, defined as the sum of squared intraday returns of small intervals, is an asymptotically unbiased estimator of the unobserved quadratic variation as the interval length approaches zero.
However, in the presence of market microstructure noise, this helpful property of RV is contaminated.More recent works investigating this issue include [3][4][5][6][7][8].When the observed price process is the true underlying price process plus microstructure noise, it is shown that RV will be overwhelmed by the noise and explodes when the sampling frequency approaches infinity.Therefore, it may be optimal to sample less frequently than is the case in the absence of noise.References [6,8] establish through a subsampling scheme improved estimators for quadratic variation.The original subsampling idea can be traced back to [9], where for the first time an unbiased datadriven estimator of volatility and a subsample averaging volatility estimator are proposed.The bias-adjusted estimator of [6] based on the subsample averaging method is able to eventually push the estimation bias to zero.Reference [8] shows that subsampling is highly advantageous for RV estimators based on discontinuous kernels.tility estimation, volatility forecasting using high-frequency information has been addressed as well.Reference [10] represents an approach to forecasting volatility using RV.The model they propose is a fractional integrated AR model, ARFI(p,d), for logarithmic RV obtained from foreign exchange rates data of 30-minute frequency, and demonstrates the superior predictive power of their model.Noticeably, the sampling frequency used in [1] is, however, 5 minutes when they analyze the distributional properties of RV.
Under the presence of microstructure noise, it may be optimal to sample less frequently than is the case in the absence of noise.As shown by [6,8,9], subsampling schemes improve estimators for quadratic variation.When it comes to forecasting RV, if the highest frequency returns are not necessarily the optimal frequency, the subsampling of the available highest frequency data will lead to several systematically sampled subsamples.In this paper we attempt to answer two questions: (1) can we use these subsamples to improve the out-of-sample forecasting?and (2) how do we incorporate the subsample information when producing forecasts of daily realized volatility?We find that as long as the available highestfrequency data are not at the optimal frequency, we can improve prediction of RV out-of-sample through simple averaging of forecasts produced from subsamples.
The paper is organized as follows: Section 2 describes the data and subsamples.Section 3 discusses two models for forecasting daily realized volatility by subsample averaging.Section 4 presents their out-of-sample relative performances.Section 5 concludes.

Data and Subsampling
The data we use consists of S&P 500 index values at 5-minute intervals recorded between 9:30 a.m. and 4:00 p.m. (total 390 minutes a day) from June 9, 1997 to May 30, 2003, a total of 1501 days and 117,078 observations 1 .In cleaning the data, those periods of market closings are treated as no variation in index values, thus there exist 78 ticks each trading day.We can construct data-driven volatility, for instance, realized volatility, from this 5-minute high-frequency data.
Let be the interval in minutes over which the returns are computed.The -minute return is the log-difference of two consecutive index values over where t denotes the logarithm of the S&P 500 index value.Even if the interval can be as small as 5 min-utes given the data available to us, we choose larger intervals such as , in order to construct subsamples.When we can construct three sets (subsamples) of  -minute return series.When 30   we can construct six sets (subsamples) of minute return series.When we can construct 12 sets (subsamples) of 60-minute return series by sampling every 12th observation of 5-minute returns.

 
The RV is calculated by the sum of squared  -minute returns within a day.Define realized volatility of day as t is the number of  -minute returns per day.
To construct a subsample, we consider an interval 5.

 
The subsampled return data has the longer time interval  than 5 minutes, within which 15-minute returns.As there are three 5minute intervals within 15 minutes, we have K = 15/5 = 3 subsamples, producing three subsample RV forecasts for a day.

Volatility Forecast Models
Based on the subsample averaging methodology in [6] and [9], we demonstrate the benefit of subsample averaging in out-of-sample forecasting of daily volatility.We aim to check if the predictive ability of an RV forecasting model can be improved by averaging forecasts generated using the same model but from different subsamples.
In comparing the predictive ability of the subsample averages with the benchmark forecast without subsample averaging, the key issues are: (1) the volatility proxy to compare forecasts with; (2) the loss function for forecast evaluation; and (3) the forecasting model.For the daily volatility proxy we use the daily RV computed from 5minute returns.For the loss function for forecast evaluation we use the mean squared forecast errors (MSFE).Regarding the forecasting model, we consider two models, ARFI model of [10] and HAR model of [11].conditional realized volatility, we estimate the fractionally integrated autoregressive model, ARFI(p,d) for for each of Specifically, the model can be written as follows: 15,30, 60.
where is the pth order AR lag polynomial.We fix and   and the long-memory parameter by using the method of [10,12].
, d The presence of microstructure noise prevents us from sampling too frequently (with too small) when calculating RV.See [3].If t (with ) series are less noisy and more accurately estimated than where are indices of subsamples, 1, , k  then using these likely leads to a better forecast

than
. However, abandoning the finer information    5 1 T RV  seems not very sensible.Thus, a possible way to balance this is the subsample averaging approach, which uses all subsamples, meaning no intra-day information is tossed out, and then taking the average over to produce a final forecast.At the same time, the contaminating effect of microstructure noise in the stage of computing RV is moderated through this subsampling procedure (using instead of ), and hence it is possible to achieve better performance in forecasting RV.

   
The subsample averaging method is as follows.First, for a given we estimate the ARFI(p,d) model for each of subsamples generated by systematically sampling the original 5-minute index data every steps: where t X denotes kth daily logarithm of RV in day computed using the kth subsample.Following [10] we use ARFI(p,d) model on the natural logarithm transfor-t mation of RV, and then exponent- . Running the model on each subsample we obtain K number of RV forecasts , Next, we compute their to obtain the combined forecast (denoted Subsample-Mean in Table 1).Similarly, "Subsample-Median" is to take the median of , and Subsample-Median forecast with the benchmark forecast (denoted Benchmark in Table 1).The benchmark is computed using only one subsample  the last subsample and abandoning the other 1 K  subsample information.

HAR Model
The heterogeneous autoregressive (HAR) model of the realized volatility in [11] is inspired by the heterogeneous market hypothesis and is able to reproduce memory persistence (although not formally a long memory model) as well as many other major features of financial data.The square-root of daily RV is assumed to have an AR-type process on past RVs over different intervals of aggregation (daily, weekly, and monthly): is the weekly aggregated RV, and 1 20 is the monthly aggregation.As illustrated in Section 3.1 above, we consider 15,30, 60.

 
The subsample averaging method for the HAR model works in exactly the same way as the ARFI model.First, we estimate the HAR model for each of subsamples generated by systematically sampling the original 5-minute index data every steps.We take subsamples of the apply the HAR model to each subsample to generate the subsample forecast for the next day , and then use the simple average or median to obtain the subsample averaging forecasts.The only difference here as compared to the ARFI case is the model used to produce forecasts.Note that here we use the square root of RV, as in [11], thus MSFE values of will appear in different scale than those of ARFI(p,d), when we show out-of-sample results in the next section.

Results
Evaluating the volatility forecast involves the selection of a volatility proxy given that the true underlying volatility is latent and subject to a researcher's own definition.
Recent papers addressing this include [13,14].Generally, the suggested volatility proxy is the realized volatility in some particular form, coupled with an MSFE evaluation criterion.
Our empirical work involves the comparison of forecasting performances of subsample averaging (mean and median) of the forecasts using all -minute return subsamples with the benchmark model using only one -minute return subsample.We consider three cases (A, B, and C) of subsampling for in Tables 1  and 2.
In Table 1 where the ARFI(p,d) model is used to forecast RV, we find that subsample averaging (mean and median) improves upon the benchmark in all three cases for Subsample-Mean and in two cases for Subsample-Median, quite substantially for Case A. In Table 2 where the HAR model is used, Subsample-Mean and Subsample-Median are substantially better than the benchmark for all three subsampling cases.

Conclusion
We propose a forecasting methodology that uses subsample averaging to forecast daily realized volatility.[10] where p = 1 and d is estimated by [12] method dynamically for each case.The out-of-sample size is P = 500 Days, and the in-sample size is R = 1000 Days.We report the out-of-sample MSFE with three different subsampling cases.∆-minute returns have m observations per day.Subsample-Mean or Subsample-Median is taken over the K subsample forecasts.The benchmark uses only the last subsample forecast.Bolded term indicates the better-than-benchmark case.Notes: This table compares the performance of the HAR model for the square root of the daily realized volatility (RV) forecast without and with subsampling average.We use the HAR model proposed by [11], which is applied to the square root of RV and estimated dynamically for each case.The out-of-sample size is P = 500 Days, and the in-sample size is R = 1000 Days.We report the out-of-sample MSFE with three different subsampling cases.∆-minute returns have m observations per day.Subsample-Mean or Subsample-Median is taken over the K subsample forecasts.The benchmark uses only the last subsample forecast.Bolded term indicates the better-thanbenchmark case.
While subsample averaging has been proposed and used in estimating RV, this paper is the first to use subsample averaging for forecasting RV.In this paper, we show that subsample averaging, which was originally suggested to overcome the bias in estimating quadratic variation under the presence of market microstructure noise, can also help forecast RV out-of-sample.In an application of S&P 500 index daily volatility forecasting, using two classical forecasting models for RV, we find that the subsample averaging forecast generally and substantially improves upon forecasts using only one subsample without averaging over all subsamples.We expect that subsample averaging method can enhance even more the forecast ability of RV when much higher-frequency data (15-seconds, for instance) are available.