HAC-Robust Measurement of the Duration of a Trendless Subsample in a Global Climate Time Series

The IPCC has drawn attention to an apparent 
leveling-off of globally-averaged temperatures over the past 15 years or so. 
Measuring the duration of the hiatus has implications for determining if the 
underlying trend has changed, and for evaluating climate models. Here, I 
propose a method for estimating the duration of the hiatus that is robust to 
unknown forms of heteroskedasticity and autocorrelation (HAC) in the 
temperature series and to cherry-picking of endpoints. For the specific case of 
global average temperatures I also add the requirement of spatial consistency between 
hemispheres. The method makes use of the Vogelsang-Franses (2005) HAC-robust 
trend variance estimator which is valid as long as the underlying series is 
trend stationary, which is the case for the data used herein. Application of 
the method shows that there is now a trendless interval of 19 years duration at 
the end of the HadCRUT4 surface temperature series, and of 16 - 26 years in the 
lower troposphere. Use of a simple AR1 trend model suggests a shorter hiatus of 
14 - 20 years but is likely unreliable.


Introduction
The 2013 report of the Intergovernmental Panel on Climate Change drew attention to a pause, or "hiatus" in what appears otherwise to be a long upward trend in globally-averaged temperatures ( [1] Ch. 9). Figure 1 shows the HadCRUT4 combined land and ocean surface temperature series over the 1850-2014 interval [2]. The data are shown in the form of ˚C "anomalies" or deviations from local averages. The underlying trend is clearly up- ward over the 20th century, but a leveling off at the end is visible. Figure 2 shows the lower troposphere (LT) series from the University of Alabama-Huntsville method (denoted UAH, [3]) and Figure 3 shows the LT series using the RSS method of [4]. These observations run from January 1979 to April 2014 and show a similar leveling-off over the ending segment.
The IPCC does not estimate the duration of the hiatus, but it is typically regarded as having extended for 15 to 20 years. While the HadCRUT4 record clearly shows numerous pauses and dips amid the overall upward trend, the ending hiatus is of particular note because climate models project continuing warming over the period. Since 1990, atmospheric carbon dioxide levels rose from 354 ppm to just under 400 ppm, a 13% increase. [1] reported that of the 114 model simulations over the 15-year interval 1998 to 2012, 111 predicted warming. [5] showed a similar mismatch in comparisons over a twenty year time scale, with most models predicting 0.2˚C -0.4˚C/ decade warming. Hence there is a need to address two questions: 1) how should the duration of the hiatus be measured? 2) Is it long enough to indicate a potential inconsistency between observations and models? This paper focuses solely on the first question. For an approach to the second, see [6].
Several considerations need to be addressed in order for a duration measure to be robust. Consider a basic trend model t t y a bt e = + + (1) in which y t is assumed to be trend-stationary (in other words, does not contain a unit root), a and b are coefficients to be estimated, t is a linear trend spanning 1, ,T  and e t is a covariance-stationary residual which may be described as autocorrelated or persistent to unknown dimensions. A series with a positive trend is one for which where c α is the (1 2α − ) confidence interval around b . An end-of-sample pause in the upward trend is defined using the duration parameter J, defined such that in other words the maximum length subset of the final observations of the series t y such that the (1 2α − ) confidence interval around the trend term contains zero. Throughout this paper we use 0.025 α = and hence a 95% confidence interval. Framed this way, two criteria can be set down to ensure robustness.
a) The confidence interval must be heteroskedasticity and autocorrelation (HAC) robust. While the IPCC itself typically uses a simple AR1 process to model residuals, there is empirical evidence that the error processes exhibit higher-order autocorrelation and long term persistence [6]- [8]. Tests of the series used herein all showed evidence of higher than first-order autocorrelation. Hence the estimator must be valid for longer memory structures.
b) The longer is T, the more likely it is possible to find a single spurious value J that satisfies Equation (3)  even if, for all other values of J, the trend is significant. To avoid cherry-picking J, if it really marks the beginning of a pause in the trend, rather than being simply an artefact of the pseudo-cyclical behavior of persistent residuals, we require that condition (3) be observed at every point in the interval , , J T  . A further condition can be added for the specific case studied here, namely globally-averaged surface temperatures, that there ought to be spatial consistency, namely c) conditions (3), (a) and (b) ought to hold jointly in both the Northern and Southern Hemispheres, since a pause in only one but not both would not be truly global, even if it caused the average of the two series to have an insignificant trend. Combining dence interval c α is HAC-robust, and the value of MAX J computed for both the Northern and Southern Hemispheres equals or exceeds that for the globe as a whole.
I apply this definition to the HadCRUT4, UAH and RSS series, obtaining hiatus durations of, respectively, 19, 16 and 26 years respectively. Were one to use an AR1 method the estimated durations would be 14, 14 and 20 years respectively.

The VF Test and HAC-Robust Confidence Intervals
To address condition (a) I apply the HAC-robust trend confidence interval estimator of [9] (hereafter VF). HAC methods are very common in econometrics and have recently been applied in climatology, see [6]. The VF approach falls in the class of non-parametric HAC estimators, but avoids the problem of bandwidth selection by setting it equal to the sample size T. The resulting estimator exhibits very little size distortion in comparison to other HAC estimators using limited bandwidth, while maintaining comparable power levels. However, the estimator converges to a non-standard distribution so the critical values must either be simulated or computed using the bootstrap method. In the case of straightforward trend estimation, as in the current application, critical values are available in VF.
The trend coefficient is estimated on monthly data using OLS. The full derivation of the variance-covariance estimator, including asymptotics, is available in VF, and I will only explain the computational steps here. Since we will apply the test not only to the global series but also jointly to hemispheric trends, I will present it for the multivariate case.
A multivariate trend model is written where 1, , i N =  , the number of time series for which trends are to be measured and tested. Denote We are interested in testing null hypotheses of the form against alternatives 0 : H Rb r ≠ where R and r are known restriction matrices of dimension q × N and q × 1 respectively, where q denotes the number of restrictions being tested. The matrix R is assumed to have full row rank. Robust tests of 0 H need to account for correlation across time, correlation across series, and conditional heteroskedasticity as summarized by the long run variance of t e . This is defined as is the matrix autocovariance function of e t . The VF statistic is constructed using the following estimator of Ω: . This is the Bartlett kernel nonparametric estimator of Ω using a bandwidth (truncation lag) equal to the sample size. The VF statistic for testing 0 : H Rb r = is given by where t d  are the residuals from regressing the trend t on a constant. Equation (7) was originally proposed by [10] although in the different but computationally identical form: where 1t t j j S e = = ∑ . See [11] for proof of the equivalence between (6) and (9).
In the univariate case (q = 1), the square root of (7) can be interpreted as a t-type statistic and critical values are tabulated in VF. By treating the denominator as the robust standard error term ( ) i se b , a HAC-robust 95% confidence interval can be constructed where the 0.025 critical value 0.025 c is, from VF, 6.482. The confidence interval given by Equation (10) is robust to all forms of autocorrelation up to but not including a unit root. All the data series used herein were tested using a Dickey-Fuller test and in every case the unit root hypothesis was rejected.
The AR1 case is handled differently, using the Bartlett approximation of [12], in which the lag-one AR coefficient r is estimated using the Cochrane-Orcutt iterative method, then the standard errors are scaled up by the factor ( ) ( ) 1 1 r r + − . This yields a valid approximation to the long-run variance only if the error terms are AR1. However the data series used herein are known to exhibit higher-order autocorrelation (see, e.g. [6]- [8]) so the AR1 approach will likely lead to over-rejections of the null, which in this case implies under-estimation of MAX J . Table 1 presents the surface results. The data are monthly but we only consider annual increments for J, and only for durations of more than m = 5 years, as failure to reject the null of no trend in the vicinity of the end of the sample could, in the case of such a short interval, arise simply due to the small size of the subsample. The bolded line at J = 19, for the year 1995, shows the maximum value of J for which the lower bound of the confi-  (3) is satisfied. Looking up the column from there it can be seen that condition (b) is also satisfied. In column 7, the number shown is the VF score on the test that the separate NH and SH trends are jointly zero. The 5% critical value for the q = 2 case is 40.68, which is only exceeded at J = 23, therefore the additional condition (c) is also met in this case. Hence, for the surface HadCRUT4 data, MAX 19 J = . Figure 4 shows these results in graphical form. It is clear that the confidence intervals widen at the end of the sample as J increases and the number of degrees of freedom decline, but not before first shrinking somewhat, indicating that at the computed value of MAX J the number of degrees of freedom is not the limiting factor on the size of the confidence interval. Table 2 and Table 3 present the UAH and RSS series, respectively. For UAH, the bolded line at J = 16, corresponding to the year 1998, shows the maximum value of J for which the lower bound of the confidence interval     is negative. Looking up the column from there it can be seen that condition (b) is also satisfied. In column 7, the VF score on the joint NH and SH trend tests only exceed 40.68 at J = 19, therefore the additional condition (c) is also met in this case. Hence, for the lower troposphere UAH data, MAX 16 J = . Figure 5 shows the UAH results in graphical form. For the RSS results, comparable examination yields MAX 26 J = , corresponding to a start date at 1988. The results are shown in Figure 6.

Conclusion
I propose a robust definition for the length of the pause in the warming trend over the closing subsample of surface and lower tropospheric data sets. The length term MAX J is defined as the maximum duration J for which a valid (HAC-robust) trend confidence interval contains zero for every subsample beginning at J and ending at

T m −
where m is the shortest duration of interest. This definition was applied to surface and lower tropospheric temperature series, adding in the requirement that the southern and northern hemispheric data must yield an identical or larger value of MAX J . In the surface data we compute a hiatus length of 19 years, and in the lower tropospheric data we compute a hiatus length of 16 years in the UAH series and 26 years in the RSS series. MAX J estimates based on an AR1 estimator are lower but likely incorrect since higher-order autocorrelation exists in the data. Overall this analysis confirms the point raised in the IPCC report [1] regarding the existence of the hiatus and adds more precision to the understanding of its length.