Statistical Models for Forecasting Tourists ’ Arrival in Kenya

In this paper, an attempt has been made to forecast tourists’ arrival using statistical time series modeling techniques—Double Exponential Smoothing and the Auto-Regressive Integrated Moving Average (ARIMA). It is common knowledge that forecasting is very important in making future decisions such as ordering replenishment for an inventory system or increasing the capacity of the available staff in order to meet expected future service delivery. The methodology used is given in Section 2 and the results, discussion and conclusion are given in Section 3. When the forecasts from these models were validated, Double Exponential Smoothing model performed better than the ARIMA model.


Introduction
Tourism is one of Kenya's major foreign exchange earners.This greatly depends on the arrival of various groups of tourists.The forecast of tourists' arrivals is important since it would enable the tourism related industries like airlines, hotels and other stakeholders to adequately prepare for any number of tourists at any future date.In this paper, an attempt has been made to forecast tourists' arrivals using statistical time series modeling techniques-Double Exponential Smoothing and Auto-Regressive Integrated Moving Average (ARIMA).[1] used the same models to forecast milk production in India.[2] used univariate SARIMA models to forecast tourists' demands in India.
Then data on tourists' arrival in Kenya were obtained from the Ministry of East African Affairs, Commerce and Tourism, Department of Tourism.Tourists' arrival for the period 1995 to 2008 was used for model fitting, and data for the remaining periods from 2009 to 2012 were used for model validation.The analysis was carried out using R-language, Excel and Minitab version 16.1.1.

Selection of Appropriate Smoothing Techniques
Once the presence of trend is detected in the data, smoothing of the time series data follows.Various smoothing techniques as discussed by [3] The size of α used has a great influence on the forecast.The best value of α corresponding to the mini- mum mean square error (MSE) is usually used.

Double Exponential Smoothing (Holt's)
The form of the model is where t L in the model is the level of the series at time t and t b is the slope (Trend) of the series at time t , α and β ( ) 0.1, 0.2, , 0.9 =  are the smoothing coefficient for level and smoothing coefficient for trend respectively.In order to fit the model, it is necessary to calculate the initial values of the level 0 L and the trend 0 b . [4]suggests that the initial values can be obtained as 0  ( ) The pair of α and β that gives a minimum Mean Square Error is preferred.

Triple Exponential Smoothing (Winter's)
When time series data exhibit seasonality, Triple Exponential Smoothing method is the most recommendable.It incorporates three smoothing equations; first for the level, second for trend and third for seasonality.

Model Identification
According to Box and Jenkins two graphical procedures are used to access the correlation between the observations within a single time series data.According to [5], these devices are called an estimated autocorrelation functions and the estimated partial autocorrelation function.These two procedures measure statistical relationships within the time series data.Summarization of statistical correlation within the time series data is the other step in the identification.Box and Jenkins suggest a whole family of ARIMA models from which we may choose.
In choosing the model that seems appropriate we use the estimated ACF and PACF.This is due to the basic idea that every ARIMA model will have unique ACF and PACF associated with it.Thus we select the model whose theoretical ACF and PACF resembles the anticipated ACF and PACF of the time series data [6].

Estimation
An estimate of the coefficients of the model is obtained by modified least squares method or the maximum likelihood estimation method suitable to the time series data.

Diagnostic Checking
Diagnostic checks help to determine if the anticipated model is adequate.At this stage, an examination of the residuals from the fitted model is done and if it fails the diagnostic tests, it is rejected and we repeat the cycle until an appropriate model is achieved.
The ARIMA model is obtained by taking t W as the first differenced time series, i.e.
Equation ( 3) is referred to as the ARIMA ( ) ,1, p q .Different combinations of AR and MA individually yield different ARIMA models [7].The optimal model is obtained on the basis of minimum value of Akaike Information Criteria (AIC) given by AIC 2log 2 where m p q = + and L is the likelihood function.The Root Mean Square Error (RMSE) and the Mean Abso- lute Percentage Error (MAPE) are used to evaluate the performance of the various models and are given below.
( ) where t Y is the tourists' arrival in different years and t F is the forecasted tourists' arrivals in the correspond- ing years and n is the number of years used as forecasting period.

Exponential Smoothing Model
Table 1 shows the yearly tourists' arrival in Kenya (in thousands) for the period 1995-2012.The time plot (Figure 1) revealed that there was increasing trend from the year 2002 to 2007.However, there was a sharp drop 0.1 0.9 0.7 0.3 where 1, 2, 3 m = and 4 the initial values for the level t L and trend t b are 973.6 and 17.66 respectively.Table 2 shows the forecast of tourists' arrivals using the chosen double exponential smoothing model.

ARIMA Model
Figure 1 showed that the series was non-stationary since there was some trend component present.The data was made stationary by taking the first order difference ( ) The time plot of the differenced data is shown in Figure 2.
Using R-language for different values of p and q , various ARIMA models were fitted and the best model was chosen on the basis of minimum value of the selection criteria, that is, Akaike Information Criteria (AIC) whose formula is given in Equation ( 4).In this way, ARIMA (1, 1, 1) was found to be the best model.The fitted model is given by 1 43.6319 0.9999 0.4115 .

Comparison and Conclusion of the Performance of the Two Models
Performance evaluation measures MAPE and the RMSE were obtained for the forecasted tourists' arrivals for the years 2009 to 2012.The comparison of the two models based on MAPE and RMSE is as given in Table 4. Based on the results from the table, Double Exponential Smoothing model was the best to forecast tourists' arrival in Kenya as both its MAPE and RMSE values were least compared to those of ARIMA (1, 1, 1).

Figure 1 .
Figure 1.Time plot of tourists' arrival in Kenya between 1995 and 2009.

Table 1 .
Data on tourists' arrival in Kenya for the period 1995 to 2012. the number of tourists in the year 2008 followed by an increasing trend from the year 2009 to 2012.For smoothing the data, Holt's Double Exponential Smoothing was used.Various combinations of α and β both ranging from 0.1 to 0.9 with increments of 0.1 were tried and Mean Squared Error for the forecasts (54.186) and Mean Absolute Percentage Error (3.028) was least for 0.1 Source: Ministry of East African affairs, Commerce and Tourism: Department of Tourism (Kenya).in

Table 2 .
Forecast of tourists' arrival in Kenya using double exponential smoothing.