Using HAC Estimators for Intervention Analysis

The purpose of this article is to present an alternative method for intervention analysis of time series data that is simpler to use than the traditional method of fitting an explanatory Autoregressive Integrated Moving Average (ARIMA) model. Time series regression analysis is commonly used to test the effect of an event on a time series. An econometric modeling method, which uses a heteroskedasticity and autocorrelation consistent (HAC) estimator of the covariance matrix instead of fitting an ARIMA model, is proposed as an alternative. The method of parametric bootstrap is used to compare the two approaches for intervention analysis. The results of this study suggest that the time series regression method and the HAC method give very similar results for intervention analysis, and hence the proposed HAC method should be used for intervention analysis, instead of the more complicated method of ARIMA modeling. The alternative method presented here is expected to be very helpful in gaming and hospitality research.


Introduction
An intervention model or interrupted time series model [1] is an Autoregressive Integrated Moving Average (ARIMA) model of a time series in which at least one of the predictors is a dummy variable for an event, which is thought of as an interruption in a pure ARIMA process. An ARIMA model in which differencing is not used is also called an ARMA model.
The use of intervention analysis or interrupted time series analysis is very common in hospitality and tourism literature. Bonham and Gangnes [2] used intervention analysis to show that the 5% Hawaii hotel room tax started in 1987 How to cite this paper: Singh, A.K. and Dalpatadu, R.J. (2020) Using HAC Estima-did not significantly impact Hawaii hotel room revenues. Min's [3] intervention analysis of inbound tourism data showed that both the earthquake of September 21, 1999 and the Severe Acute Respiratory Syndrome (SARS) of 2003 had significant negative impacts on Taiwan's inbound tourism. A thorough review of time series forecasting literature is provided by De Gooijer and Hyndman [4].
Ahlgren et al. [5] used an ARIMA model to assess the impact of a higher gaming tax rate in the state of Illinois on gaming volume, and concluded that the gaming volume experienced a significant decrease when the tax increase took effect.
Ahlgren et al. [6] used an ARIMA model to assess the impact of a higher gaming tax rate in the state of Illinois on marketing expenditure by a major Illinois riverboat operator. Toma et al. [7] used a seasonal ARIMA model to show that the book "Midnight in the Garden of Good and Evil" set in Savannah, Georgia had a significant positive effect on hotel tax receipts, while both 9/11 and hurricane Floyd had a significant negative effect. Goh and Law [8] used intervention analysis to show that relaxation of issuing out-bound visitor visas, the Asian financial crisis, the handover, and the bird flu epidemic had significant and expected impacts on Hong Kong tourism demand. Eisendrath et al. [9] used intervention analysis on Las Vegas Strip gaming volume to show that 9/11 had a significant negative impact lasting five months. Suh et al. [10] used this approach to investigate the effects of cash revenue generated from non-comped diners (CASHREV), amount spent by the casino in comped-meals (COMPREV) on gaming volume, using major holidays and Motorcycle Rally as intervention predictors; their ARIMA model showed that both CASHREV and COMPREV were significant predictors of slot coin-in, and Motorcycle Rally had a significant and negative impact on slot coin-in. Zheng et al. [11] used ARIMA intervention analysis to study the impact of recession on restaurant stock performance. D'Amuri and Marcucci [12] used ARIMA modeling to assess the impact of an index of Google job-search intensity on the monthly US unemployment rate. Intervention analysis has been used in other disciplines as well. Su and Deng [13] used time series regression with intervention term to predict the yield of Yu Ebao. Huang [14] has used intervention analysis to show that government intervention improved a firm's investment efficiency. The purpose of the present article is to introduce a method from econometric modeling that is simpler to use than the ARIMA method for intervention analysis.

Problem Statement and Methodology
The general form of an autoregressive moving average (ARMA) model is 1 1  [15].
Following steps are used in fitting an intervention model to a time series Y t of the response variable as a function of predictor(s) X t and intervention variable(s) I t : (1) The time series Y t is plotted to assess the presence of trend with time. A polynomial function of time t is typically used to model the trend.
(2) A multiple linear regression (MLR) model is fitted to the data, such that the variance inflation factor (VIF) values of all predictors are not too high; values of VIF above 5 suggest the presence of multicollinearity [16], and values of VIF above 10 indicate that the MLR model suffers from serious multicollinearities [17]. One typically drops predictors with highest VIF value, one by one, in order to get a reasonable MLR model. ( 3) The MLR model assumes that the errors are independent and normally distributed with 0 means and a common error variance, and the standard t-test is used to conduct significance tests for the coefficients of the predictors. In a time series data, however, the residuals are typically auto-correlated, and hence the use of the t-test is not valid. Plots of the auto-correlation function (ACF) and partial auto-correlation function (PACF) function are examined to determine the order of the ARIMA(p,d,q) × (P, D, Q) model; here p is the order of the non-seasonal auto-regressive term, d the non-seasonal differencing used, q the order of the non-seasonal moving average term, and P, D, Q are the corresponding seasonal terms.
(4) Once the ARIMA model has been identified, a time series regression model is fitted to the data with all predictors and the ARIMA terms in the model, and the residuals from this model are tested for zero auto-correlations up to h lags; the Ljung-Box test [18] is commonly used for this purpose. Hyndman and Athanasopoulos [19] recommend using h = 10 lags in the Ljung-Box test After a time-series regression model with uncorrelated residuals is found, the significance of all predictors is tested.
The last or 4 th step at times proves to be quite challenging, and an alternative approach for testing the significance of all predictors and intervention variables is proposed and investigated in this study.
One of the assumptions of multiple linear regression (MLR) is that the variance of the response variable is same across the range of predictor values; when this is not the case, we say that heteroscedasticity (or heteroskedasticity) is present [20]. When the error terms (residuals) from an MLR model are autocorrelated (i.e., not independent) the standard estimate of the correlation matrix needs to be corrected for both heteroscedasticity and the presence of autocorrelation. These estimators are referred to as HAC-Consistent estimators [21].
This approach uses three different heteroskedasticity and autocorrelation consistent (HAC) estimators of the covariance matrix, used in econometric modeling [22], namely HAC, Kern-HAC, and Newey-West in place of Steps (3) and (4) above. Examples from hospitality literature as well as a synthetic time series data are used to demonstrate the effectiveness of this alternative approach, and parametric bootstrap of time series will be used to compare the results from standard ARIMA-based intervention model and the results of significance testing using HAC estimators. In Step (2) above, one typically keeps only the significant predictors in the MLR model; in this paper, to keep things simple, all predictors in the model are kept as long as the VIF values are less than 5.

Comparison of P-Values from ARIMA Model and HAC Estimators
Three different time series data sets are used to compare the ARIMA method for intervention analysis and significance tests using HAC estimators of the covariance matrix of the estimated coefficients of the MLR. The time series in the first two examples are real data sets from gaming literature, the first one modeled by an ARIMA(3,0,0) process with a cubic trend, and the second by an ARIMA(0,0,2) process with a cubic trend. For the third example, a synthetic monthly time series of length 84 was used. Following steps are used for this comparison:

Examples from Tourism and Hospitality Literature
Example 1: Impact of 9/11 on Las Vegas Strip Gaming Revenue  Figure 1 is a plot of the ACF and PACF values for lags 1 to 20; this graph suggests using ARIMA(3,0,0) model for the residuals [22]. Table 2; it can be seen from   ) the months February, June and December are statistically significant, each with lower average slot coin-in than the other ten months, 2) the terrorist attack of September 2011 (predictor D9_11) had a significant and negative impact on slot coin-in, and 3) the ARIMA terms AR2 and AR3 are statistically significant. Figure 2 shows the ACF of the residuals from the ARIMA(3,0,0) Time Series Regression Model of Table 2, and also the P-values of the Ljung-Box (LB) test for 10 lags. The ACF plot shows that the residuals from the ARIMA(3,0,0) Time Series Regression Model are not auto-correlated, which is confirmed by the LB test since P-values at lags 1 through 10 are all above 0.05. Table 3 shows the results of t-tests using the HAC estimator of the covariance matrix; these results are very similar to the ones obtained from ARIMA model ( Table 2). Figure 3 shows plots of the Las Vegas Strip Coin-in and five bootstrap samples generated from it using the estimated ARIMA(3,0,0) model given in Table  2. The bootstrap samples are seen to have the same general pattern as the Las Vegas Strip Coin-in time series. Figure 4 shows the histogram of 1000 P-values for D9_11 obtained from 1000 bootstrap runs. Since D911 term is highly significant (see Table 2 and Table 3), the P-values from bootstrap samples are expected to be small. The term D11 (dummy column for November) is not significant (see Table 2 and Table 3), and hence the 1000 P-values for D11 are expected to exceed 0.05. Figure 4 and Figure 5 clearly show that all of the intervention analysis methods yield similar results.     Ahlgren et al. [6] used secondary data for the period January 2000 to December 2006, provided by a major Illinois riverboat operator, to assess the impact of a gaming tax rate increase in the state of Illinois on marketing expenditure by the riverboat operator. A time series regression model was fitted to marketing expenditure; the predictors were a cubic trend, eleven dummy columns for the months of February through December (see Example 1), a dummy column DTax for Illinois tax increase which was 1 for months 43 through 67 and 0 for all other months, and an interaction term between the linear term and the DTax column. Table 4 shows the MLR model fitted to the Marketing Expenditure data. The cubic trend is significant, along with the month of July, the intervention event DTax and the interaction term. Figure 6 shows plots of ACF and PACF of the residuals from the MLR model of Table 4. The behavior of the autocorrelation functions suggests an ARIMA(0,0,1) process but the residuals turned out to be auto-correlated. An ARIMA(0,0,2) process provided good fit to the Marketing Expenditure data, as can be seen from Figure 7. Table 5 shows the fitted ARIMA model. The intervention term DTax is highly significant, and the quadratic trend component Zt2 is not significant (P-value = 0.91). The t-tests using HAC estimators yield similar results (see Table 6).     Figure 8 shows the Marketing Expenditure time series (top left) and five bootstrap samples generated from it using the estimated ARIMA(0,0,2) model given in Table 5. Figure 9 shows the histograms of 1000 bootstrap P-values for ARIMA and HAC methods for the statistically significant term DTax; Figure 10 shows the same for the insignificant term Zt2. Both of these figures show that the ARIMA and HAC methods provide similar results.

The final Time Series Regression model is shown in
Example 3: Synthetic time series The advantages of working with a simulated (synthetic) time series are that the truth is known and hence the estimates can be compared to the corresponding true parameter values.
The synthetic time series was generated from the following model:     model, and Figure 12 shows that ARIMA(2,0,2) model yields uncorrelated residuals. Note that the synthetic time series was generated from an ARIMA (2,0,2) process. Tables 7-9 show the fitted MLR model, the ARIMA(2,0,2) model, and the significance test results from the HAC estimators, respectively. The true values used to generate the synthetic time series are also shown in these three tables. It can be seen from this table that all of the estimated coefficients are close to their corresponding true values. The intervention term DE is seen to be highly significant, and DNov is not significant. Figure 13 shows the synthetic time series and five bootstrap samples generated from the synthetic series and the estimated ARIMA(2,0,2) model of Table 8.
The histograms of 1000 bootstrap P-values for the intervention term DE and the dummy variable for November (DNov) are shown in Figure 14 and Figure 15, respectively. Both of these figures again show that the ARIMA and HAC methods provide similar results.        analysis instead of the ARIMA model, especially in situations where finding the right ARIMA model turns out to be challenging.