Forecasting Hotel Prices in Selected Middle East and North Africa Region (MENA) Cities with New Forecasting Tools

The purpose of this paper is to understand the potential of traditional and non-traditional statistical techniques to predict dynamic hotel room prices. Four forecast models were employed: the simple moving average, the autoregressive integrated moving average (ARIMA), the radial basis function (RBF), and the support vector machine (SVM). This research is based on an empirical study of data obtained from the company Smith Travel Research (STR). The economic predictors were obtained from other reliable sources such as the World Bank and the World Tourism Organization. This study agreed with existing literature on the ability of machine learning to predict hotel room prices precisely. Given the complexity of the hotel industry, the effect of external economic predictors was tested in the model. The challenge lay in dealing with the mixed frequencies observed in the collected data. This research is designed to add an innovative approach to the existing literature on machine learning in the hotel industry in the Middle East and North Africa (MENA) region. Some of the machine learning techniques used in this study constitute a contribution to the research conducted in this region. This creates a bridge between many academic disciplines such as computer science, economics, and marketing. Small hotel operators should benefit from this research when setting strategies as well as in using the model to set their relative room prices.

hope. Tourism accounts for 10% of world GDP. It contributes to world societies by promoting cultures, as well as by adding more jobs to the economy. In 2016, 1.2 billion tourists were recorded worldwide and this number is expected to grow to 1.8 billion (50% growth) by 2030 [1]. This research paper focuses on one aspect of the tourism and hospitality sector, which is accommodation and, more specifically, hotels. Practitioners as well as researchers have been motivated to study the dynamic prices of hotel rooms and understand the determinants of these changes and, accordingly, be able to forecast prices more effectively. Akm [2], Drew et al. [3], Hassani et al. [4], Padhi and Aggarwal [5], Yang et al. [6], Youn and Gu [7], Jovanovic et al. [8], Uysal [9], and Magnini et al. [10] have all forecasted hotel or tourism demand using neural networks jointly with support vector analysis, the autoregressive integrated moving average (ARIMA), logistic regression, fuzzy goal programming, and decision trees.
This study introduces a novel approach to hospitality forecasting in this region and more specifically to hotel rooms' average daily rates (ADR) based on eight major cities in the Middle East and North Africa (MENA) region. These methods include linear models (simple moving average and ARIMA) and nonlinear models (radial basis function (RBF) and the support vector machine (SVM). Research on hotels and advanced machine learning forecasting techniques is very limited in this region. This study adds to the literature in this field.
The rest of the paper is organized as follows: the second section reviews the historical literature on Tourism and Hospitality studies and the different models used for forecasting; the third section covers the problem definition and research objective of this study; the fourth section highlights the conceptual framework which would provide good visuals for the study objective and hypothesis; the fifth is the methodology section which covers the data collection part, a detailed data analysis, a list of the key variables and the models used in the research; the sixth section covers the statistical performance evaluation and measures which were employed to select the best model. Finally, a conclusion section covers a summary of the study outcomes and provides recommendation for future studies.

Literature Review
In the tourism and hospitality industry, it is very important to understand the variables affecting demand and eventually the performance of the industry (or a particular hotel or restaurant). In the past, managers and decision-makers relied on simple forms of data analysis (simple linear regression or multiple regression) to investigate the influence of those variables. More recently, big data analytics has unlocked the potential for studying complex business situations to understand the correlation between variables or causes and eventually forecast performance. Machine learning techniques have made it possible to analyze tens or hundreds (even thousands) of data variables that are either stored or live-streamed to help shape a time bound decision-making process.  [11], Pattie and Snyder [12], Govers et al. [13], and Law [14] have all used NN or ANN to forecast tourism or hotel demand using historical tourists' arrival data or room occupancy data. Some researchers have combined neural networks with other ML techniques, either to pick the best model or to improve the performance of the model.
Others have explored different machine learning techniques as well. Vu et al. [15], Tkaczynski et al. [16], Toral et al. [17], Dolnicar and Leisch [18], Brochado et al. [19], and Geetha et al. [20] have used clustering while dealing with market or customer segmentations and consumer behaviors. Hadavandi et al. [21], Yu and Schwartz [22] and Sohrabi et al. [23] have used fuzzy systems to predict hotel demand using arrival data. Fuzzy systems are also widely used in planning and decision-making in retail and banking. Li and Sun [24] used support vectors to predict firm failure using financial and non-financial data, while Chen and Wang [25] used the support vector technique to forecast the demand. Pantano et al. [26] used tourist attraction characteristics and the random forest method to predict tourist response, while Shapoval et al. [27] used inbound visitors numbers and the decision tree technique to develop effective destination marketing.

Problem Definition and Research Objective
Prior studies have focused mostly on tourism demand in countries, that is, at the macro level. Tourism arrivals have been extensively used as a forecasting component and most of the studies are country-specific. Macroeconomic factors take longer time to respond to changes of what is known as the lag effect. On the other hand, firm specific microeconomic factors which represent the fundamental factor model are much faster to respond to changes and are in the control of the hotel managers/owners. They often lead to significant characterization of the dependent variable under investigation [45]. The objective of this study is to firstly combine macro and micro elements while studying dynamic hotel prices. Several cities from the MENA region, which, it is assumed, go through the same economic effects, are included in the study. In addition, this study explores the benefits of using machine learning techniques in forecasting.

Research Framework and Proposed Hypotheses
Hotel practitioners have used inventory planning and pricing as major inputs in DOI: 10.4236/tel.2018.89104 their revenue management systems, which has led to successful hotel performance [39]. Lee [46] found while measuring hotel room rates that prices were affected by both internal (hotel-specific) and external (economic) factors.
Driven by literature findings, hotel attributes have a direct effect on hotel performance. This leads to the first hypothesis: H1: Big data on hotels leads to better price prediction.
Moreover, various economic factors were investigated in hotel performance studies. This has led to the inclusion of economic factors as a moderating effect to be tested in this study: H2: Economic factors moderate the relation between hotels' attributes and hotel room pricing.
The following diagram ( Figure 1) provides a visual representation of the study direction.

Data collection and analysis
The daily hotel data used in this research paper came from STR and covered  (Table 1).
The data split is done to test the models' performance in predicting unknown observations (test data is part of unsupervised learning) after determining all network/model parameters using the training data.
Economic variables were obtained from other credible sources such as the World Bank, the World Tourism Organization, the World Economic Forum and the US Energy Information Administration. The only challenge was that most of these data had different frequencies (monthly, yearly, and once every two years).
To deal with different frequencies, international tourist arrival data were converted to daily rates by dividing the annual rate by 365, while country-level annual GDP growth percentage rate, inflation rate (average consumer prices), oil price (WTI and Brent), and index data on each country's business environment, safety and security, health and hygiene, human resources, and labor market were all converted to daily rates by maintaining the same rate throughout the year.
The aim was to use these variables and measures to gain insight into the determinants of hotel room prices that would help us, and eventually decision-makers, to predict these prices with more accuracy. Innovative ways of handling mixed data frequencies could be an opportunity for future research.
This study would also be a good piece of research to validate the hotel performance determinants (HPD) model suggested by Assaf et al. [47]. The table below (Table 2) represents a list of the variables found in the literature that we utilized in our study based on data availability/accessibility.

Models in the Research
This research is based on predicting dynamic hotel room prices based on the selection of the best forecasting model. These models are linear (simple moving average and ARIMA) and non-linear (RBF and SVM) in form. The goal is to compare the above-mentioned models using the model performance measures to determine the best model or a combination of them.

Time Series Forecast
Using ADR values from the years, time series forecast would help in predicting the future ADR of hotels based on historical data. The main goal is to find a model with a better fit for the data, hence reducing the noise or error. The models that we used are the simple moving average and the Box-Jenkins ARIMA model. These models are widely used in tourism and hospitality research.

Simple Moving Average
The simple moving average method uses the average of previous n-periods as a forecast value [59].

RBF
RBF is one of the most widely used neural network techniques. Used for classification as well as regression, the RBF model is a feed-forward neural network that is based on three layers: input, hidden and output [61]. RBF models gained interest due to their advantage in achieving faster convergence with fewer errors while also being reliable (Moradkhani et al., 2004) [62].
where C represents the center and represents the width of the neuron or the radius (Wei, 2012) [61]. SVM could also work in higher dimensions if a kernel function is applied, which allows SVM to solve non-linear equations [25].

Data Analysis
As a first step, we ran a descriptive statistics analysis for the data which highlights the mean, max, min and standard deviation of the variables used in the study. The data used in this study represents daily observations obtained from STR which provided insights of the internal or microeconomic factors used by the industry to study the hospitality sector. Other macro-economic factors, which appeared also in several studies within the tourism and hospitality field; were obtained from different sources such as the world bank the World Tourism Organization, the World Economic Forum and the US Energy Information Administration. However, those factors where country based since capturing them at city level and on daily basis was impossible and out of the scope of this study. The table blow (Table 3) provides a summary of the descriptive statistics for significant variables generated from Dubai Luxury upper data which was produced using IBM SPSS: With skewness that is close to zero and less than 1 for most variables, this indicated that though we are dealing with very dynamic environment, yet data is normally distributed with means around zero.
Using documented steps in literature for each proposed model in this study, the data was then used in each model to produce the forecasts and to carry further analysis. The aim is to compare models and choose the best model for ADR prediction (or perhaps a combination of models) based on   ly for any seasonal trends throughout the year (refer to Figure 2 for ADR). The aim was to regulate this seasonality or make the data stationary in order to be able to explain the data using the autoregressive model, ARIMA. Many of these cities showed acceptable stationarity in data while some (i.e. Jeddah) showed some increasing trends over a number of years, which necessitated some treatment of the data to make the mean constant. As a result, and to maintain uniformity, first-order differencing for all cities' data was considered to make those data stationary following the 1970 Box-Jenkins method (see Figure 3) [64].
After dealing with stationarity, AIC and BIC tests were employed. A different combination of ARIMA models for each city was tested and based on the tests criteria the best model was selected for data analysis.

Statistical Performance
The following table (Table 4) represents the result of the models' performance measures for each city as a measure of forecasting accuracy.

Conventional Techniques
By employing the ARIMA and simple moving average techniques to forecast future room rates, the study found that the simple moving average performed poorly

Innovative Techniques
One of the contributions of this paper is the use of innovative machine learning tools to forecast room prices. Both RBF and SVM were utilized for prediction, which resulted in significant improvements in performance. The inclusion of external economic factors could also be one reason why these models outperformed the conventional models.
When comparing the forecasting accuracy of different models and for the eight different cities, it was found that SVM and RBF performed better than ARIMA or the simple moving average. The results show the machine learning technique's superiority in prediction compared to conventional forecasting models.

Conclusions
The use of innovative tools in hotel performance forecasting would help researchers as well as practitioners in planning effectively. Hotel internal attributes positively affect hotel performance and more specifically prices. External economic factors moderate the relationship between the hotel attributes and hotel performance.
The main objective of the study was to predict hotel room prices using new tools. The study shows that SVM is the leading model in "luxury and upscale" hotel room price forecasting, followed by RBF and then ARIMA, while the simple moving average is found in this study to be the inferior model.