Statistical Models for Forecasting Tourists’ Arrival in Kenya


In this paper, an attempt has been made to forecast tourists’ arrival using statistical time series modeling techniques—Double Exponential Smoothing and the Auto-Regressive Integrated Moving Average (ARIMA). It is common knowledge that forecasting is very important in making future decisions such as ordering replenishment for an inventory system or increasing the capacity of the available staff in order to meet expected future service delivery. The methodology used is given in Section 2 and the results, discussion and conclusion are given in Section 3. When the forecasts from these models were validated, Double Exponential Smoothing model performed better than the ARIMA model.

Share and Cite:

Akuno, A. , Otieno, M. , Mwangi, C. and Bichanga, L. (2015) Statistical Models for Forecasting Tourists’ Arrival in Kenya. Open Journal of Statistics, 5, 60-65. doi: 10.4236/ojs.2015.51008.

1. Introduction

Tourism is one of Kenya’s major foreign exchange earners. This greatly depends on the arrival of various groups of tourists. The forecast of tourists’ arrivals is important since it would enable the tourism related industries like airlines, hotels and other stakeholders to adequately prepare for any number of tourists at any future date. In this paper, an attempt has been made to forecast tourists’ arrivals using statistical time series modeling techniques―Double Exponential Smoothing and Auto-Regressive Integrated Moving Average (ARIMA). [1] used the same models to forecast milk production in India. [2] used univariate SARIMA models to forecast tourists’ demands in India.

Then data on tourists’ arrival in Kenya were obtained from the Ministry of East African Affairs, Commerce and Tourism, Department of Tourism. Tourists’ arrival for the period 1995 to 2008 was used for model fitting, and data for the remaining periods from 2009 to 2012 were used for model validation. The analysis was carried out using R-language, Excel and Minitab version 16.1.1.

2. Methodology

2.1. Selection of Appropriate Smoothing Techniques

Once the presence of trend is detected in the data, smoothing of the time series data follows. Various smoothing techniques as discussed by [3] include; Simple Exponential Smoothing (SES), Double Exponential Smoothing ( DES ), Triple Exponential Smoothing (TES) and Adaptive Response Rate Simple Exponential Smoothing (ARRSES) which are briefly described below:

2.1.1. Simple Exponential Smoothing (SES)

For the series, the forecast for the preceding value, say, is based on the weights and to the recent observation and forecast respectively, where is the smoothing constant. The form of the model is


The size of used has a great influence on the forecast. The best value of corresponding to the minimum mean square error (MSE) is usually used.

2.1.2. Double Exponential Smoothing (Holt’s)

The form of the model is


where in the model is the level of the series at time and is the slope (Trend) of the series at time, and are the smoothing coefficient for level and smoothing coefficient for trend respectively. In order to fit the model, it is necessary to calculate the initial values of the level and the trend. [4] suggests that the initial values can be obtained as and, or or. In this paper, the initial values have been obtained as and.

The pair of and that gives a minimum Mean Square Error is preferred.

2.1.3. Triple Exponential Smoothing (Winter’s)

When time series data exhibit seasonality, Triple Exponential Smoothing method is the most recommendable. It incorporates three smoothing equations; first for the level, second for trend and third for seasonality.

2.2. Auto-Regressive Integrated Moving Average (ARIMA Model)

2.2.1. Model Identification

According to Box and Jenkins two graphical procedures are used to access the correlation between the observations within a single time series data. According to [5] , these devices are called an estimated autocorrelation func- tions and the estimated partial autocorrelation function. These two procedures measure statistical relationships within the time series data. Summarization of statistical correlation within the time series data is the other step in the identification. Box and Jenkins suggest a whole family of ARIMA models from which we may choose.

In choosing the model that seems appropriate we use the estimated ACF and PACF. This is due to the basic idea that every ARIMA model will have unique ACF and PACF associated with it. Thus we select the model whose theoretical ACF and PACF resembles the anticipated ACF and PACF of the time series data [6] .

2.2.2. Estimation

An estimate of the coefficients of the model is obtained by modified least squares method or the maximum likelihood estimation method suitable to the time series data.

2.2.3. Diagnostic Checking

Diagnostic checks help to determine if the anticipated model is adequate. At this stage, an examination of the residuals from the fitted model is done and if it fails the diagnostic tests, it is rejected and we repeat the cycle until an appropriate model is achieved.

The ARIMA model is obtained by taking as the first differenced time series, i.e.


Equation (3) is referred to as the ARIMA.

Different combinations of AR and MA individually yield different ARIMA models [7] . The optimal model is obtained on the basis of minimum value of Akaike Information Criteria (AIC) given by


where and L is the likelihood function. The Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE) are used to evaluate the performance of the various models and are given below.



where is the tourists’ arrival in different years and is the forecasted tourists’ arrivals in the corresponding years and is the number of years used as forecasting period.

3. Results and Discussions

3.1. Exponential Smoothing Model

Table 1 shows the yearly tourists’ arrival in Kenya (in thousands) for the period 1995-2012. The time plot (Figure 1) revealed that there was increasing trend from the year 2002 to 2007. However, there was a sharp drop

Table 1. Data on tourists’ arrival in Kenya for the period 1995 to 2012.

Source: Ministry of East African affairs, Commerce and Tourism: Department of Tourism (Kenya).

in the number of tourists in the year 2008 followed by an increasing trend from the year 2009 to 2012. For smoothing the data, Holt’s Double Exponential Smoothing was used. Various combinations of and both ranging from 0.1 to 0.9 with increments of 0.1 were tried and Mean Squared Error for the forecasts (54.186) and Mean Absolute Percentage Error (3.028) was least for and. The fitted model is therefore given by;


where and 4 the initial values for the level and trend are 973.6 and 17.66 respectively. Table 2 shows the forecast of tourists’ arrivals using the chosen double exponential smoothing model.

3.2. ARIMA Model

Figure 1 showed that the series was non-stationary since there was some trend component present. The data was made stationary by taking the first order difference. The time plot of the differenced data is shown in Figure 2.

Using R-language for different values of and, various ARIMA models were fitted and the best model was chosen on the basis of minimum value of the selection criteria, that is, Akaike Information Criteria (AIC) whose formula is given in Equation (4). In this way, ARIMA (1, 1, 1) was found to be the best model. The fitted model is given by


The estimation of the model parameters was done by maximum likelihood estimation technique. The fitted model was then used to forecast tourists’ arrival from 2009 to 2012. The forecast values are shown in Table 3.

Table 2. Forecast of tourists’ arrival in Kenya using double exponential smoothing.

Figure 1. Time plot of tourists’ arrival in Kenya between 1995 and 2009.

Figure 2. Plot of differenced tourists’ arrival data.

Table 3. Forecast of tourists’ arrival in Kenya ARIMA (1, 1, 1) models.

Table 4. Forecast of tourists’ arrival in Kenya using double exponential smoothing and ARIMA (1, 1, 1) models.

3.3. Comparison and Conclusion of the Performance of the Two Models

Performance evaluation measures MAPE and the RMSE were obtained for the forecasted tourists’ arrivals for the years 2009 to 2012.

The comparison of the two models based on MAPE and RMSE is as given in Table 4. Based on the results from the table, Double Exponential Smoothing model was the best to forecast tourists’ arrival in Kenya as both its MAPE and RMSE values were least compared to those of ARIMA (1, 1, 1).

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Satya, P., Ramasubramanian, V. and Menta, S.C. (2007) Statistical Models for Forecasting Milk Production in India. Journal of the Indian Society of Agricultural Statistics, 61, 80-83.
[2] Padhan, P.C. (2011) Forecasting International Tourists Footfalls in India: An Assortment of Competing Models. International Journal of Business and Management, 6, 190-202.
[3] Gardener, E.S. (1985) Exponential Smoothing—The State of the Art. Journal of Forecasting, 4, 1-28.
[4] Jani, P.N. (2014) Business Statistics: Theories and Applications. PHI Learning Private Limited, Delhi.
[5] Box, G.E.P. and Jenkins, G.M. (1970) Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco.
[6] Pankratz, A. (1983) Forecasting with Univariate Box-Jenkins Models: Concepts and Cases. John Wiley and Sons, New York.
[7] Makridakis, S., Wheelwright, S.C. and Hyndman, R.J. (1998) Forecasting: Methods and Applications. John Wiley & Sons, New York.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.