^{1}

^{*}

^{2}

The purpose of this article is to present an alternative method for intervention analysis of time series data that is simpler to use tha n the traditional method of fitting an explanatory Autoregressive Integrated Moving Average (ARIMA) model. Time series regression analysis is commonly used to test the effect of an event on a time series. An econometric modeling method, which uses a heteroskedasticity and autocorrelation consistent (HAC) estimator of the covariance matrix instead of fitting an ARIMA model, is proposed as an alternative. The method of parametric bootstrap is used to compare the two approaches for intervention analysis. The results of this study suggest that the time series regression method and the HAC method give very similar results for intervention analysis, and hence the proposed HAC method should be used for intervention analysis, instead of the more complicated method of ARIMA modeling. The alternative method presented here is expected to be very helpful in gaming and hospitality research.

An intervention model or interrupted time series model [

The use of intervention analysis or interrupted time series analysis is very common in hospitality and tourism literature. Bonham and Gangnes [

The general form of an autoregressive moving average (ARMA) model is

Y t = δ + ∑ i = 1 p ϕ i Y t − i + a i − ∑ i = 1 q θ i a t − i

where

Y t = the response variable of interest.

δ = the intercept.

ϕ i = the autoregressive (AR) term coefficients.

θ i = the moving average (MA) term coefficients.

a i = the random shocks.

The above model is referred to as ARIMA(p,d = 0,q) model or ARMA(p,q) model [

Following steps are used in fitting an intervention model to a time series Y_{t} of the response variable as a function of predictor(s) X_{t} and intervention variable(s) I_{t}:

(1) The time series Y_{t} is plotted to assess the presence of trend with time. A polynomial function of time t is typically used to model the trend.

(2) A multiple linear regression (MLR) model is fitted to the data, such that the variance inflation factor (VIF) values of all predictors are not too high; values of VIF above 5 suggest the presence of multicollinearity [

(3) The MLR model assumes that the errors are independent and normally distributed with 0 means and a common error variance, and the standard t-test is used to conduct significance tests for the coefficients of the predictors. In a time series data, however, the residuals are typically auto-correlated, and hence the use of the t-test is not valid. Plots of the auto-correlation function (ACF) and partial auto-correlation function (PACF) function are examined to determine the order of the ARIMA(p,d,q) × (P, D, Q) model; here p is the order of the non-seasonal auto-regressive term, d the non-seasonal differencing used, q the order of the non-seasonal moving average term, and P, D, Q are the corresponding seasonal terms.

(4) Once the ARIMA model has been identified, a time series regression model is fitted to the data with all predictors and the ARIMA terms in the model, and the residuals from this model are tested for zero auto-correlations up to h lags; the Ljung-Box test [

The last or 4^{th} step at times proves to be quite challenging, and an alternative approach for testing the significance of all predictors and intervention variables is proposed and investigated in this study.

One of the assumptions of multiple linear regression (MLR) is that the variance of the response variable is same across the range of predictor values; when this is not the case, we say that heteroscedasticity (or heteroskedasticity) is present [

This approach uses three different heteroskedasticity and autocorrelation consistent (HAC) estimators of the covariance matrix, used in econometric modeling [

Three different time series data sets are used to compare the ARIMA method for intervention analysis and significance tests using HAC estimators of the covariance matrix of the estimated coefficients of the MLR. The time series in the first two examples are real data sets from gaming literature, the first one modeled by an ARIMA(3,0,0) process with a cubic trend, and the second by an ARIMA(0,0,2) process with a cubic trend. For the third example, a synthetic monthly time series of length 84 was used.

Following steps are used for this comparison:

(a) The time series Y_{t} is plotted to assess the presence of a trend.

(b) An MLR model is fitted to Y_{t} as a function of the predictors including a dummy column for the intervention term, and a polynomial trend function; predictors with high VIF values are removed.

(c) ACF and PACF of the residuals from MLR are plotted for identification of ARIMA terms p and q.

4An ARIMA(p,0,q) time series regression model is fitted, with p and q determined in Step (c) above. The P-values for each predictor term in the ARIMA model are calculated.

(d) ACF of the residuals from the ARIMA model of Step (d) is plotted to show that these residuals are not auto-correlated, which is followed by running the L-B test for 10 lags. If the P-values from the L-B test exceed 0.05 at each lag, these residuals are deemed not auto-correlated, which validates the correctness of the P-values computed in Step (d).

(e) P-values for each term in the MLR model of Step (b) are computed using the HAC estimators of the covariance matrix of the estimated coefficients.

(f) P-values computed in Steps (d) and (f) are compared.

The package Sandwich of the statistical software environment R [

In addition, for each time series data set, 1000 bootstrap samples are generated, and Steps (a)-(f) are repeated for each bootstrap sample. Histograms of the 1000 P-values computed in Steps (d) and (f) are plotted to compare the ARIMA method to the HAC-method of intervention analysis.

Example 1: Impact of 9/11 on Las Vegas Strip Gaming Revenue

Eisendrath et al. [

Z t = ( t − m e a n ( t ) ) / s d ( t ) , t = 1 , 2 , ⋯ , 101

Z t 2 = ( Z t ) 2

Z t 3 = ( Z t ) 3

The final Time Series Regression model is shown in

Term | Coeff | SE | t-stat | P-Value |
---|---|---|---|---|

(Intercept) | 178,074 | 4253.1 | 41.87 | 0.00 |

Zt | 5321 | 2823.7 | 1.88 | 0.06 |

Zt2 | 3371 | 1334.7 | 2.53 | 0.01 |

Zt3 | 12,251 | 1455 | 8.42 | 0.00 |

DFeb | −15,827 | 5626.7 | −2.81 | 0.01 |

DMar | 10,343 | 5663.9 | 1.83 | 0.07 |

DApr | −7969 | 5665.2 | −1.41 | 0.16 |

DMay | 949 | 5666.9 | 0.17 | 0.87 |

DJun | −10,794 | 5669.2 | −1.90 | 0.06 |

DJul | −521 | 5504.4 | −0.10 | 0.92 |

DAug | −3477 | 5503.7 | −0.63 | 0.53 |

DSep | −1103 | 5477.2 | −0.20 | 0.84 |

DOct | 2524 | 5479.8 | 0.46 | 0.65 |

DNov | −6164 | 5483.9 | −1.12 | 0.26 |

DDec | −24,787 | 5626.7 | −4.41 | 0.00 |

D9_11 | −11,474 | 5126 | −2.24 | 0.03 |

Term | Coeff | SE | t-stat | P-Value |
---|---|---|---|---|

ar1 | −0.16 | 0.10 | −1.64 | 0.10 |

ar2 | 0.20 | 0.10 | 2.14 | 0.03 |

ar3 | 0.35 | 0.10 | 3.55 | 0.00 |

intercept | 178,540 | 4103.90 | 43.50 | 0.00 |

Zt | 6052 | 3663.60 | 1.65 | 0.10 |

Zt2 | 3763 | 1663.70 | 2.26 | 0.02 |

Zt3 | 11,731 | 1772.80 | 6.62 | 0.00 |

DFeb | −15,770 | 5431.90 | −2.90 | 0.00 |

DMar | 9228 | 4707.30 | 1.96 | 0.05 |

DApr | −7972 | 4362.70 | −1.83 | 0.07 |

DMay | 793 | 5191.00 | 0.15 | 0.88 |

DJun | −11,732 | 4862.10 | −2.41 | 0.02 |

DJul | −1369 | 4739.50 | −0.29 | 0.77 |

DAug | −4392 | 4763.40 | −0.92 | 0.36 |

DSep | −1814 | 5019.50 | −0.36 | 0.72 |

DOct | 1863 | 4253.80 | 0.44 | 0.66 |

DNov | −6755 | 4559.50 | −1.48 | 0.14 |

DDec | −26,473 | 5427.20 | −4.88 | 0.00 |

D9_11 | −12,800 | 4694.60 | −2.73 | 0.01 |

1) the months February, June and December are statistically significant, each with lower average slot coin-in than the other ten months, 2) the terrorist attack of September 2011 (predictor D9_11) had a significant and negative impact on slot coin-in, and 3) the ARIMA terms AR2 and AR3 are statistically significant.

P-Value | ||||
---|---|---|---|---|

Term | Coeff | HAC | Kern-HAC | Newey-West |

(Intercept) | 178,074 | 0.00 | 0.00 | 0.00 |

Zt | 5321 | 0.06 | 0.04 | 0.05 |

Zt2 | 3371 | 0.01 | 0.00 | 0.00 |

Zt3 | 12,251 | 0.00 | 0.00 | 0.00 |

DFeb | −15,827 | 0.00 | 0.00 | 0.00 |

DMar | 10,343 | 0.05 | 0.04 | 0.04 |

DApr | −7969 | 0.07 | 0.08 | 0.04 |

DMay | 949 | 0.84 | 0.84 | 0.82 |

DJun | −10,794 | 0.02 | 0.02 | 0.01 |

DJul | −521 | 0.91 | 0.91 | 0.90 |

DAug | −3477 | 0.50 | 0.49 | 0.45 |

DSep | −1103 | 0.82 | 0.83 | 0.81 |

DOct | 2524 | 0.43 | 0.40 | 0.31 |

DNov | −6164 | 0.26 | 0.29 | 0.24 |

DDec | −24,787 | 0.00 | 0.00 | 0.00 |

D9_11 | −11474 | 0.00 | 0.00 | 0.00 |

Example 2: Impact of tax rate increase on Marketing Expenditure

Ahlgren et al. [

Term | Coeff | SE | t-stat | P-Value |
---|---|---|---|---|

(Intercept) | 2,401,290 | 85,168 | 28.20 | 0.00 |

DFeb | 50,976 | 102,531 | 0.50 | 0.62 |

DMar | 186,509 | 102,626 | 1.82 | 0.07 |

DApr | 193,334 | 102,780 | 1.88 | 0.06 |

DMay | 156,254 | 102,991 | 1.52 | 0.13 |

DJun | 139,804 | 103,258 | 1.35 | 0.18 |

DJul | 270,399 | 103,991 | 2.60 | 0.01 |

DAug | 170,899 | 103,471 | 1.65 | 0.10 |

DSep | 91,241 | 103,587 | 0.88 | 0.38 |

DOct | 116,540 | 103,771 | 1.12 | 0.27 |

DNov | 162,265 | 104,030 | 1.56 | 0.12 |

DDec | 174,118 | 104,369 | 1.67 | 0.10 |

DTax | −872,334 | 99,982 | −8.73 | 0.00 |

Zt | −79,785 | 79,378 | −1.01 | 0.32 |

Zt2 | −10,461 | 30,231 | −0.35 | 0.73 |

Zt3 | 107,666 | 38,177 | 2.82 | 0.01 |

Zt × DTax | 574,225 | 140,373 | 4.09 | 0.00 |

Term | Coeff | SE | t-stat | P-Value |
---|---|---|---|---|

ma1 | 0.79 | 0.13 | 6.04 | 0.00 |

ma2 | 0.40 | 0.13 | 3.08 | 0.00 |

intercept | 2,336,700 | 96,680 | 24.17 | 0.00 |

DFeb | 53,826 | 60,783 | 0.89 | 0.38 |

DMar | 190,250 | 86,570 | 2.20 | 0.03 |

DApr | 191,670 | 98,658 | 1.94 | 0.05 |

DMay | 149,160 | 98,846 | 1.51 | 0.13 |

DJun | 127,250 | 99,162 | 1.28 | 0.20 |

DJul | 258,060 | 99,678 | 2.59 | 0.01 |

DAug | 199,570 | 99,452 | 2.01 | 0.04 |

DSep | 114,340 | 99,455 | 1.15 | 0.25 |

DOct | 133,910 | 99,579 | 1.34 | 0.18 |

DNov | 169,570 | 88,794 | 1.91 | 0.06 |

DDec | 174,830 | 64,935 | 2.69 | 0.01 |

DTax | −919,370 | 125,930 | −7.30 | 0.00 |

Zt | −211,180 | 113,700 | −1.86 | 0.06 |

Zt2 | 5014 | 43,350 | 0.12 | 0.91 |

Zt3 | 162,250 | 53,238 | 3.05 | 0.00 |

Zt × DTax | 945,590 | 190,680 | 4.96 | 0.00 |

P-Value | ||||||
---|---|---|---|---|---|---|

Term | Coeff | SE | t-stat | HAC | Kern-HAC | Newey-West |

(Intercept) | 2,401,290 | 86,223 | 27.85 | 0.00 | 0.00 | 0.00 |

DFeb | 50,976 | 52,603 | 0.97 | 0.34 | 0.35 | 0.30 |

DMar | 186,509 | 69,202 | 2.70 | 0.01 | 0.01 | 0.00 |

DApr | 193,334 | 101,366 | 1.91 | 0.06 | 0.08 | 0.04 |

DMay | 156,254 | 99,852 | 1.56 | 0.12 | 0.12 | 0.08 |

DJun | 139,804 | 109,682 | 1.27 | 0.21 | 0.14 | 0.10 |

DJul | 270,399 | 107,795 | 2.51 | 0.01 | 0.01 | 0.00 |

DAug | 170,899 | 120,226 | 1.42 | 0.16 | 0.13 | 0.08 |

DSep | 91,241 | 130,436 | 0.70 | 0.49 | 0.47 | 0.41 |

DOct | 116,540 | 113,469 | 1.03 | 0.31 | 0.27 | 0.21 |

DNov | 162,265 | 98,990 | 1.64 | 0.11 | 0.05 | 0.03 |

DDec | 174,118 | 50,850 | 3.42 | 0.00 | 0.00 | 0.00 |

DTax | −872,334 | 134,931 | −6.47 | 0.00 | 0.00 | 0.00 |

Zt | −79,785 | 157,052 | −0.51 | 0.61 | 0.74 | 0.70 |

Zt2 | −10,461 | 48,646 | −0.22 | 0.83 | 0.86 | 0.84 |

Zt3 | 107,666 | 68,712 | 1.57 | 0.12 | 0.32 | 0.25 |

Zt × DTax | 574,225 | 109,356 | 5.25 | 0.00 | 0.00 | 0.00 |

Example 3: Synthetic time series

The advantages of working with a simulated (synthetic) time series are that the truth is known and hence the estimates can be compared to the corresponding true parameter values.

The synthetic time series was generated from the following model:

Y 0 t = 5000 + 20 × t + 3000 × DJun + 3200 × DJul + 2500 × DAug + 1000 × DSep + 4500 × DE + e

where

t = 1,2, ⋯ ,84 with 1 representing January month of Year 1 and 84 representing December of Year 7.

DE = dummy variable for the intervention event E.

DE = 1 for t = 43,44, ⋯ ,67 ; 0 otherwise.

e = ARIMA(2,0,2) error process with ϕ 1 = 0 .8897 , ϕ 2 = − 0 .4858 , θ 1 = − 0 .2279 , θ 2 = 0 .2488 and sd σ = 1000 .

Tables 7-9 show the fitted MLR model, the ARIMA(2,0,2) model, and the significance test results from the HAC estimators, respectively. The true values used to generate the synthetic time series are also shown in these three tables. It can be seen from this table that all of the estimated coefficients are close to their corresponding true values. The intervention term DE is seen to be highly significant, and DNov is not significant.

For each of the three examples presented in this paper, the ARIMA method of intervention analysis and HAC methods yielded similar results. The four methods (ARIMA, HAC, Kern-HAC, and Newey-West) also yielded similar results for 1000 bootstrap samples from the original time series in each case. These results demonstrate that the simpler HAC method can be used for intervention

Term | True value | Estimated Coeff | SE | t-stat | P-Value |
---|---|---|---|---|---|

(Intercept) | 5000 | 5052.46 | 474.93 | 10.64 | 0.00 |

t | 20 | 22.69 | 5.55 | 4.09 | 0.00 |

DFeb | 0 | 782.41 | 613.40 | 1.28 | 0.21 |

DMar | 0 | −210.10 | 613.47 | −0.34 | 0.73 |

DApr | 0 | −418.36 | 613.60 | −0.68 | 0.50 |

DMay | 0 | −202.43 | 613.78 | −0.33 | 0.74 |

DJun | 3000 | 3325.94 | 614.00 | 5.42 | 0.00 |

DJul | 3200 | 3690.96 | 614.93 | 6.00 | 0.00 |

DAug | 2500 | 3304.80 | 614.60 | 5.38 | 0.00 |

DSep | 1000 | 1466.00 | 614.98 | 2.38 | 0.02 |

DOct | 0 | 302.53 | 615.40 | 0.49 | 0.62 |

DNov | 0 | −300.60 | 615.88 | −0.49 | 0.63 |

DDec | 0 | −734.41 | 616.40 | −1.19 | 0.24 |

DE | 4500 | 4054.31 | 292.26 | 13.87 | 0.00 |

Term | True value | Estimated Coeff | SE | t-stat | P-Value |
---|---|---|---|---|---|

ar1 | 0.8897 | 0.86 | 0.26 | 3.29 | 0.00 |

ar2 | −0.4858 | −0.54 | 0.17 | −3.08 | 0.00 |

ma1 | −0.2279 | −0.20 | 0.29 | −0.67 | 0.50 |

ma2 | 0.2488 | 0.22 | 0.18 | 1.20 | 0.23 |

intercept | 5000 | 5196.30 | 451.88 | 11.50 | 0.00 |

t | 20 | 20.52 | 6.14 | 3.34 | 0.00 |

DFeb | 0 | 659.54 | 383.31 | 1.72 | 0.09 |

DMar | 0 | −369.36 | 543.55 | −0.68 | 0.50 |

DApr | 0 | −533.84 | 619.73 | −0.86 | 0.39 |

DMay | 0 | −255.27 | 614.44 | −0.42 | 0.68 |

DJun | 3000 | 3296.66 | 566.10 | 5.82 | 0.00 |

DJul | 3200 | 3619.11 | 534.03 | 6.78 | 0.00 |

DAug | 2500 | 3186.39 | 556.57 | 5.73 | 0.00 |

DSep | 1000 | 1327.86 | 605.77 | 2.19 | 0.03 |

DOct | 0 | 228.50 | 617.62 | 0.37 | 0.71 |

DNov | 0 | −247.24 | 552.31 | −0.45 | 0.65 |

DDec | 0 | −599.44 | 405.99 | −1.48 | 0.14 |

DE | 4500 | 4125.18 | 319.54 | 12.91 | 0.00 |

P-Value | |||||||
---|---|---|---|---|---|---|---|

Term | True value | Estimated Coeff | SE | t-stat | HAC | Kern-HAC | Newey-West |

(Intercept) | 5000 | 5052.46 | 524.70 | 9.63 | 0.00 | 0.00 | 0.00 |

t | 20 | 22.69 | 6.48 | 3.50 | 0.00 | 0.13 | 0.06 |

DFeb | 0 | 782.41 | 673.07 | 1.16 | 0.25 | 0.17 | 0.18 |

DMar | 0 | −210.10 | 799.27 | −0.26 | 0.79 | 0.77 | 0.75 |

DApr | 0 | −418.36 | 554.21 | −0.75 | 0.45 | 0.40 | 0.36 |

DMay | 0 | −202.43 | 504.41 | −0.40 | 0.69 | 0.76 | 0.74 |

DJun | 3000 | 3325.94 | 595.24 | 5.59 | 0.00 | 0.00 | 0.00 |

DJul | 3200 | 3690.96 | 542.29 | 6.81 | 0.00 | 0.00 | 0.00 |

DAug | 2500 | 3304.80 | 641.68 | 5.15 | 0.00 | 0.00 | 0.00 |

DSep | 1000 | 1466.00 | 474.09 | 3.09 | 0.00 | 0.01 | 0.01 |

DOct | 0 | 302.53 | 516.78 | 0.59 | 0.56 | 0.57 | 0.53 |

DNov | 0 | −300.60 | 404.99 | −0.74 | 0.46 | 0.19 | 0.26 |

DDec | 0 | −734.41 | 389.84 | −1.88 | 0.06 | 0.00 | 0.00 |

DE | 4500 | 4054.31 | 239.75 | 16.91 | 0.00 | 0.00 | 0.00 |

analysis instead of the ARIMA model, especially in situations where finding the right ARIMA model turns out to be challenging.

The authors declare no conflicts of interest regarding the publication of this paper.

Singh, A.K. and Dalpatadu, R.J. (2020) Using HAC Estimators for Intervention Analysis. Open Journal of Statistics, 10, 31-51. https://doi.org/10.4236/ojs.2020.101003