^{1}

^{2}

^{3}

^{2}

^{2}

It is now widely recognized that the statistical property of long memory may be due to reasons other than the data generating process being fractionally integrated. We propose a new procedure aimed at distinguishing between a null hypothesis of unifractal fractionally integrated processes and an alternative hypothesis of other processes which display the long memory property. The procedure is based on a pair of empirical, but consistently defined, statistics namely the number of breaks reported by Atheoretical Regression Trees (ART) and the range of the Empirical Fluctuation Process (EFP) in the CUSUM test. The new procedure establishes through simulation the bivariate distribution of the number of breaks reported by ART with the CUSUM range for simulated fractionally integrated series. This bivariate distribution is then used to empirically construct a test which rejects the null hypothesis for a candidate series if its pair of statistics lies on the periphery of the bivariate distribution determined from simulation under the null. We apply these methods to the realized volatility series of 16 stocks in the Dow Jones Industrial Average and show that the rejection rate of the null is higher than if either statistic was used as a univariate test.

It is now widely assumed as a stylized fact that many financial and economic time series exhibit the statistical property of long memory, see, for example, [

The econometrics literature has paid considerable attention to the problem of distinguishing true long memory from structural change in economic or financial time series which exhibit the long memory property because economic and/or financial arguments can be advanced as to why the series studied may have such breaks. For example, [

However, the factional integration and structural change models yield significantly different pricing for financial assets, for example see [

In light of the substantial literature on long memory vs structural breaks, it would be easy to see our paper as continuing this line of inquiry. However, our purpose is different. The survey of [

[

Nevertheless, efforts have been made to establish tests or procedures which can reliably determine the difference between a true long memory process and ones for which the long memory property arises from some other data generating process. In this literature some significant papers are [

This paper addresses two issues. First, what are the statistical properties of structural break tests when applied to true long memory series? Second, can structural break tests be used to distinquish between true long memory and other processes which give rise to the long memory property. This paper adds to the literature in two ways. We extend the work of [

Secondly, by combining two well established empirical structural breaks statistics into a bivariate procedure we show that this bivariate procedure rejects the null hypothesis at a higher rate than either univariate test on its own. The two empirical statistics used are the number of breaks reported by ART and the range of the Cu- mulative Summation (CUSUM) test introduced by [

We apply the new insights to 16 realized volatility series of stocks in the Dow Jones Industrial Average and show that for 15 of the 16 series the bivariate statistic is not consistent with a true long memory data generating process. For comparison purposes we present the results for univariate tests using ART or CUSUM alone and the results of the recently proposed [

Stated simply, the essence of our new procedure is to use structural break methods to identify the number of breaks, or the range of the empirical fluctuations process (defined in Section 3 below), in a series and compare that number or range with what we would expect to observe, given the values of d, if the null hypothesis of a fractionally integrated process were generating the data. If the null hypothesis is rejected, all we can conclude is that some alternative process, not that required by the null, is the DGP.

The remainder of this paper is set out as follows. Section 2 gives a brief overview of fractionally integrated series. Section 3 presents the methods used. Section 4 presents the results for distribution of breaks in FI(d) series for univariate ART. Section 5 describes the example data set. Section 6 presents an application to stock market realized volatilities. Section 7 contains the discussion and Section 8 concludes.

A number of models have been proposed to account for the extraordinary persistence of the correlations across time found in long memory series. Here we consider fractionally integrated series and series with structural breaks in the mean.

The most widely used models for long memory series are the Fractional Gaussian Noises (FGNs), a continuous time process, and their discrete time counterparts, the fractionally integrated series of order d. FGNs were introduced into applied statistics by [

Deﬁnition 1 A real-valued stochastic process

where

Deﬁnition 2 A real-valued process

It follows from Definition 1 that H is constant for all subseries of an H self-similar process.

The stationary increments of an H-self-similar series with

where

Thus for

Independently [

Denoting the backshift operator by B, for a non-integer value of d the operator

where

where c is a constant which depends on d, and on any AR(p), and MA(q) components used in the model. For

The two parameters H and d are related by the simple formula

Both FGNs and FI(d) s have been extensively applied as models for long memory time series and their theoretical properties studied. See the volumes by [

[

Models of this type which have been proposed typically have stochastic shifts in the mean, but overall are mean reverting about some long term average. The most popular of these are the breaks models which we define as follows:

where

It is important to note that Equation (2) is just a way to represent a sequence of different models (i.e. models subjected to structural breaks). This model only deals with breaks in mean. It can be generalized for any kind of break. In series with a structural break the noise process,

The rationale behind our approach is that if we were to test a true long memory process for breaks we would observe some, but they would, in fact, all be spurious. However, we can estimate by simulation how many spurious breaks we might observe under the null and this will vary depending on d (or H). If we observe a significant difference to that expected we will conclude that the process is unlikely to be a true long memory process as our expectation of the number of (spurious) breaks will have been exceeded. Unlike other authors we do not conclude that the DGP must be one of breaks as there may be other alternatives to true long memory that might exhibit long memory-like properties, we simply conclude “not the null”. For this procedure to have any power, we need to establish the distribution of breaks for the individual and bivariate breaks tests, which we do below.

To obtain the null distribution of reported breaks for fractionally integrated series we simulated FI(d) series using farima Sim from the package f Series [

We obtained a bivariate distribution of the number of breaks reported by ART [

In the CUSUM test the residuals are standardized by dividing by the estimated series standard deviation and the cumulative summation of the residuals is plotted against time. That is

where

This type of method can be usefully thought of as a parametric bootstrap. An important characteristic that is important for our procedure is that the bivariate distributions in later examples shows low correlation between the two statistics (ART and CUSUM). This indicates that the information from the two is complementary, so that the combined test can be expected to perform better than either univariate test on their own.

The procedure is as follows and uses several different packages for the R [

1) Estimate d for the full series. Unless otherwise stated all estimates of d were obtained with the estimator of [

2) Through simulation, obtain the bivariate probability distribution of number of breaks reported by ART and the CUSUM range for the null distribution of an FI(d) series with d as estimated in the previous step and the same number of observations as the series under test.

a) Simulate a large number of FI(d) series (we used 1000 replications) with appropriate d value and length. FI(d) series were simulated with the function farima Sim in the R package f Series of [

b) Use ART to break the series into “regimes” and record the number of reported breaks.

c) Obtain the CUSUM range using the efp function in the R package strucchange of [

d) Plot the bivariate distribution.

3) Apply ART to the full series to obtain the reported number of break points.

4) Use the efp to obtain the CUSUM range of the full series.

5) Overplot the bivariate statistic of reported number of breaks with CUSUM range on the previously obtained null distribution.

6) Assess whether the bivariate statistic for the series is consistent with the null distribution obtained by simulation. This can be done either visually or by generating contours of significance for the bivariate dis- tribution. Contours of significance can be obtained either by using the data from the simulations or kernel smoothing can be applied if desired.

For comparison purposes we provide the p-values for the univariate ART and CUSUM tests.

[

We obtain the mean number of breaks under the null hypothesis of the series being an FI(d) process. If the number of reported breaks in a series under test exceeds the 95% (or other significance) level based on the Poisson distribution then we reject the null of an FI(d) process. It should be noted that [

The method and results presented here are intended to extend his work to obtain a way of gaining a reasonable estimate of the expected number of reported breaks for a wide range of d values and series lengths in fractionally integrated series.

For reasons of space we report a representative selection of results. The remainder are available on request from the authors.

The distribution of the number of breaks reported by ART for series with 4000 data points is presented in

The mean number of reported breaks per series for various series lengths and d values is presented in

We fitted a function to the empirical data to obtain formulas for calculating the mean and various tail probabilities. The approximations are calculated by computing the following variables in the order given:

where

Each fit has been generated by minimizing a function measuring the error between the fitted function f and the known values

The two columns on the left of

2.5% | 5% | mean | 90% | 95% | 97.5% | 99% | |
---|---|---|---|---|---|---|---|

x_{1} | 0.1003 | 0.8494 | −0.6283 | −0.3475 | −0.2594 | −0.3988 | −0.6808 |

x_{2} | −0.0084 | 0.416 | 0.5883 | 0.3915 | 0.3876 | 0.3558 | 0.3014 |

x_{3} | 0.8031 | 1.0362 | −0.0613 | −0.5026 | −0.5384 | −0.7014 | −0.9815 |

x_{4} | 10,408 | 17,154 | 24,804 | 23,792 | 23,874 | 21,503 | 19,892 |

x_{5} | −1505 | 5468 | 3188 | 3195 | 3350 | 3233 | 2672 |

x_{6} | 415.1 | −2677 | −503.4 | −375.1 | −369.8 | −326.6 | −213.3 |

x_{7} | 31.179 | −0.1075 | −0.1248 | −0.0977 | −0.0975 | −0.0969 | −0.0937 |

x_{8} | 1.66e−4 | 6.8e−5 | −4.690e−6 | −5.61e−6 | −4.06e−6 | −4.1e−6 | −3.85e−6 |

x_{9} | 0.2767 | −0.5988 | −0.6830 | −0.6064 | −0.6062 | −0.3101 | −0.3032 |

x_{10} | 0.0544 | 0.0606 | 0.0606 | 0.0606 | 0.0601 | 0.0601 | 0.0550 |

x_{11} | 0.2366 | −0.1209 | −0.1245 | −0.1313 | −0.1306 | −0.1439 | −0.1705 |

x_{12} | 1.9432e−7 | 6.67e−9 | 4.6e−10 | 6.4e−10 | 2.8e−10 | 7.6e−10 | 1.4e−10 |

where

The third column of

and the approximation is given by F, not f. The residual sum of squares for the optimal fit was 0.4436.

The four columns on the right of

The two upper quantiles given by the first two columns allow two sided tests to be performed. When the lower limit on the two sided test is zero, the lower limit effectively says nothing as the number of breaks can not be negative. In such cases a one sided test should be used in place of the two sided test.

The data set analysed in Section 6 below, comprised the realized volatility of 16 Dow Jones Industrial Average (DJIA) index stocks and were provided by [

five-minute grids, which is a consistent estimator of the daily volatility. A fuller explanation of the dataset and how the realized volatilities were calculated can be found in [

We applied the bivariate ART vs CUSUM range as described in the Section 3 to the 16 series in the data set. For reasons of space we present only a representative selection of results, the remainder are available on request from the authors. The four results from the new computational procedure are presented for series with d estimates of 0.36 (GM), 0.40 (JNJ), 0.42 (PFE), and 0.44 (IBM) in panel (a) of Figures 4-7. For comparison purposes we plot the corresponding univariate results for the reported number of breaks and the CUSUM range in panels (b) and (c) respectively. These four results include one example for which the null can be rejected by univariate ART (JNJ), one by univariate CUSUM (GM), and two (IBM, PFE) for which the null is not rejected by either univariate test but is rejected by the bivariate procedure.

In Figures 4-7 the vertical axis is the number of breaks reported by ART. When considered as a discrete univariate distribution the vertical axis is simply the test of [

The horizontal axis in panel (a) of Figures 4-7 is the CUSUM range from the well-known CUSUM test. When taken alone some stocks such as GM (see

the exception was Walmart (WMT).

To summarise, the results of univariate ART and CUSUM tests are presented in columns “ART Test p-value” and “CUSUM Test p-value” respectively in

[

correlations and well understood asymptotic properties which allowed them to theoretically derive critical values for varying levels of statistical significance. The results of their test are presented in the column labelled “ORT” of

As discussed in Section 1 the problem of distinguishing among models with true long memory and other models

which display apparent long memory properties is difficult. This paper’s primary contribution is that we present a procedure based on the use of a bivariate distribution which, in the 16 series examined, appears to easily show the realized volatility 15 of the series are not FI(d). Secondarily we have extended the work of [

In Section 4 the change of behaviour seen in

processes exhibiting the long memory property at least two approaches are required. Tests or procedures involving ART, either alone or in conjuction with other established statistics, would only be useful when d was sufficiently high, and the series sufficiently short, that a reasonable number of breaks would be expected to be reported. When d was sufficiently low, or the series sufficiently long, that no breaks would be expected to be reported some alternative method would need to be used. For financial data with a typical d value of about 0.40 and several thousand observations ART should be useful.

Reported | Expected | ART Test | CUSUM | ORT | ||
---|---|---|---|---|---|---|

Series | d Est | Breaks | Breaks | p-value | Test p-value | |

AA | 0.42 | 7 | 7.5 | 0.48 | ^{*}0.00 | 2.68 |

AIG | 0.40 | 6 | 6.8 | 0.52 | ^{*}0.00 | 4.02 |

BA | 0.40 | 10 | 6.8 | 0.09 | ^{*}0.01 | 1.63 |

CAT | 0.41 | 7 | 7.1 | 0.42 | ^{*}0.01 | 5.32 |

GE | 0.44 | 8 | 8.0 | 0.41 | ^{*}0.01 | 4.98 |

GM | 0.36 | 7 | 5.5 | 0.19 | ^{*}0.01 | 5.20 |

HP | 0.44 | 6 | 8.0 | 0.69 | ^{*}0.00 | 0.50 |

IBM | 0.44 | 11 | 8.0 | 0.11 | 0.06 | 1.13 |

INTC | 0.46 | 6 | 8.6 | 0.75 | ^{*}0.00 | 2.20 |

JNJ | 0.40 | 11 | 6.8 | ^{*}0.05 | 0.16 | 3.62 |

KO | 0.42 | 11 | 7.5 | 0.08 | ^{*}0.00 | 4.40 |

MRK | 0.39 | 7 | 6.4 | 0.31 | ^{*}0.05 | 3.89 |

MSFT | 0.46 | 10 | 8.6 | 0.25 | ^{*}0.03 | ^{*}11.24 |

PFE | 0.42 | 10 | 7.5 | 0.14 | 0.07 | 3.88 |

WMT | 0.42 | 10 | 7.5 | 0.14 | 0.31 | 4.15 |

XON | 0.44 | 6 | 8.0 | 0.69 | ^{*}0.00 | 1.26 |

The results reported by [

With the exception of the [

The results of looking at the data with a bivariate breaks vs CUSUM range distribution, as in Figures 4-7, are promising and we believe points the way for future progress in this area. On these bivariate distributions the data point for each of the four real time series was clearly in the extremes of the tails of the distribution. Indeed, all four of the results presented here appear to be significant at close to the 0.001 level. Of the 16 series, 15 of them were unable to accept the null hypothesis of an FI(d) series with d as estimated for the full series.

A summary of the results for all four tests and procedures is presented in

Many other authors have expressed reservations about the reality of the long memory property apparently exhibited by many financial and economic time series. We have proposed a new method based on a bivariate check of the data which compares the real data with properties of the distribution obtained for simulated series. The particular properties we concentrate on are the number of breaks observed in the real data and their EFP range compared to what we would expect if the true DGP were a fractionally integrated process. The use of bivariate distributions to distinguish between true fractionally integrated series and other series displaying the long memory property appears to be a very promising avenue of future research. In the first application to realized volatilities this methodology did not accept the null hypothesis of true fractionally integrated series for

Test | No. of rejections of null. |
---|---|

Univariate ART | 1 |

Univariate CUSUM | 12 |

Bivariate ART vs CUSUM | 15 |

ORT [ | 1 |

15 of the 16 series. This is a higher rate of rejection than using either of the two statistics, which form the bivariate distribution, in their univariate forms seperately.

There are unresolved statistical issues which merit further research. In the bivariate approach we have estimated d but then proceeded as if the d value was known a priori. Clearly the bivariate distribution is dependent on d and further work needs to be done to establish the usefulness of the approach. Also tests of other models which display the long memory property need to be carried out.

Finally, it must be stressed that this new procedure uses the number of breaks to ascertain whether the series is likely to be generated by a fractionally integrated prcoess. In rejecting the null we do not conclude that the alternative is actually a breaks model. There may be possible alternatives to the null and our procedure is simply to test for fractional integration.

We would like to thank the participants in the MODSIM09 conference and two anonymous referees for helpful and constructive feedback.

William Rea,Chris Price,Les Oxley,Marco Reale,Jennifer Brown, (2016) A New Procedure to Test for Fractional Integration. Open Journal of Statistics,06,651-666. doi: 10.4236/ojs.2016.64055