A Hybrid Methodology for Short Term Temperature Forecasting

Developing a reliable weather forecasting model is a complicated task, as it requires heavy IT resources as well as heavy investments beyond the financial capabilities of most countries. In Lebanon, the prediction model used by the civil aviation weather service at Rafic Hariri International Airport in Beirut (BRHIA) is the ARPEGE model, (0.5) developed by the weather service in France. Unfortunately, forecasts provided by ARPEGE have been erroneous and biased by several factors such as the chaotic character of the physical modeling equations of some atmospheric phenomena (advection, convection, etc.) and the nature of the Lebanese topography. In this paper, we proposed the time series method ARIMA (Auto Regressive Integrated Moving Average) to forecast the minimum daily temperature and compared its result with ARPEGE. As a result, ARIMA method shows better mean accuracy (91%) over the numerical model ARPEGE (68%), for the prediction of five days in January 2017. Moreover, back to five months ago, in order to validate the accuracy of the proposed model, a simulation has been applied on the first five days of August 2016. Results have shown that the time series ARIMA method has offered better mean accuracy (98%) over the numerical model ARPEGE (89%) for the prediction of five days of August 2016. This paper discusses a multiprocessing approach applied to ARIMA in order to enhance the efficiency of ARIMA in terms of complexity and resources.


Introduction
World Meteorological Organization (WMO) called for integrating the global efforts needed to enhance the accuracy in weather forecasts [1]. Indeed, the development of efficient weather prediction models has always been particularly difficult and challenging for meteorological services [2] [3]. In fact, the prediction of temperature is highly recommended on many sectors such as agriculture, hydrology and meteorology. In the case of Lebanon, the temperature is considered as an important parameter in forecasting since it has substantial risks. Moreover, the temperature could produce the constitution of hazardous fog which produces a fatal risk during Lebanese people's movements. In addition, temperature can impact tourism sectors especially public and private institutions (hospitals, audiovisual media, hotels and "resorts", ski clubs ...). The prediction of the temperature can be achieved using many approaches and techniques such as numerical models, Learning Techniques (LT) and Time Series Analysis (TSA) [2] [4]. The proposed model ARIMA fit Moving Average (MA) model having an order 35 which is considered to be very high order, this order poses a high complexity and requires heavy computers resources and a long execution time; as a solution for this problem, a migration towards ARIMA parallel seems to be a very powerful proposed method in this framework.
In 2007, Mohsen Hayati et al. worked on the application of neural networks to study the design of short-term temperature forecasting (STTF) [5]. The result of the Multi-Layer Perceptron (MLP) network model used to forecast the temperature for one day ahead in the Kermanshah, Iran, shows that it has good performance and reasonable prediction accuracy compared to real temperature values.
In 2010, S. Santhosh Baboo et al. also worked on neural network-based algorithm for temperature prediction. The results are compared with real data issued from meteorological department. These results confirm that the model has a good potential to predict the temperature in the forecasting service [5].
Then, in 2012, Kumar Abhisheka et al. used Artificial Neural Network to forecast the temperature. The results show that by increasing the number of neurons from 20 to 50 and 80, and increasing the number of hidden layer from 5 to 10, the Mean Square Error decreased from 3.65 to 2.71 and the performance of the proposed model increased [6].
Moreover, in 2018, Thi-Thu-Hong et al. [7], discussed a comparative study of univariate Forecasting Methods for Meteoroligical Time Series, they have considered that the humidity is an important meteorological factor to study due to its direct effect on the temperature: The Single exponential smoothing method (SES), the Season Al-naive (SNAIVE), SARIMA, Feed-Forward Neural Network (FFNN), Dynamic Time Wraping Based Imputation (DTWBI), Bayesian Structural Times Series (BSTS). they conclude that the result issuing form method Seasonal SARIMA (92%) is more efficient than the result issuing from the other studied techniques: Feed-Forward Neural Network (FFNN) (91%), Dynamic Time Warping-based Imputation (DTWBI) (91%), Bayesian Structural Time Se-ries (BSTS) (88%), Single exponential smoothing (SES) (87%), Seasonalnaive (Snaive) (86%), to predict the humidity where the real value is 93%. This paper contains three sections: The first section presents the numerical prediction model ARPEGE; The second section discusses Auto Regressive Integrated Moving Average (ARIMA); The third section shows the comparative study between ARPEGE and two different approaches of ARIMA (sequential and multiprocessing) where as the justification of using such approaches would be discussed While the fourth section discusses the results obtained during our comparative study between different approaches, finally, the conclusion of our works and the presentation of the perspective of our study. This paper presents a short term prediction (5 days ahead) of the temperature in two different periods (e.g. summer and winter): 1) from 01/08/2016 to 05/08/2016; 2) from 01/01/2017 to 05/01/2017.

ARPEGE
ARPEGE is a numerical model developed by Meteo France and widely used in several countries. As other numerical prediction model, ARPEGE is based on a set of Navier-Stokes equations that describes the movements of fluids. ARPEGE model covers the whole globe with an average mesh of 16 km and Europe with a mesh of 7.5 km. Moreover, its horizontal resolution is about 7.5 km over France and 37 km in the antipodes. ARPEGE has 105 levels where the first level is 10 meters above the surface and can reach 11 km of altitude [8] [9] [10].
ARPEGE model is capable of guiding weather forecasters beyond the first few hours of many upcoming meteorological phenomena such as thunderstorm, snow, cold front, warm front etc. [9] [11] [12] [13] [14] [15]. However, ARPEGE's fundamental problems appear in its incapacity to solve the chaotic equations which model some of the atmospheric phenomena such as: tornado, turbulence, etc. and in the weakness of accuracy in the forecasting over a geographically defined area [9] [13] [16].

ARIMA
Recently, Time Series Analysis (TSA) becomes very useful in the prediction field as it is applied in several domains such as weather forecasting, economics, engineering, environment, medicine, etc. [8]. In weather forecasting, TSA helps in predicting future weather situations, by collecting historical data within regular time intervals [14] [15]. Known as series, this data is collected at specific intervals (hourly, daily, weekly, monthly, yearly, etc.). TSA offers certain advantages in terms of trusted results in weather forecasting and efficiency in detection both trends and seasonality within the series [8]. The ARIMA model follows a linear expression given by the following equation: where: y t : is the observed value. ε t : is the random error at time t. ϕ j : are the coefficients respectively of the AR (Auto Regressive) model. Θ j : are the coefficients respectively of the MA (Moving Average) model. p: order of auto regressive. q: order of moving average [1]. On the other hand, while building an ARIMA model, the series must be stationary. Otherwise, stationary transformation is required [17].

Comparative Study between ARPEGE, ARIMA Sequential and ARIMA Parallel
In the proposed comparative methodology (Figure 1), ARPEGE is compared to two different approach of ARIMA (ARIMA SEQ. and ARIMA PARA.) in order to predict the temperature for five days ahead. The results of these three methods are compared to choose the best model in term of accuracy and resources consumption (RAM, CPU and execution time).

Numerical Weather Model ARPEGE
In Lebanon, a physical forecasting model used by civil aviation meteo service at BRHIA is ARPEGE.
The prognostic variables of the model for the atmospheric part are: the horizontal components of the wind, the temperature, the specific humidities of the water vapor and four categories of hydrometeors (liquid droplets, ice crystals, rain, snow) and the turbulent kinetic energy etc. In addition, the outputs of the ARPEGE model are graphs, vertical sections, time diagrams, weather maps etc.
A post-processing of data makes the recovered data as a possible process in term of graphs, maps of weather etc.
Unfortunately, in some cases, the forecasts provided by ARPEGE have been erroneous and biased due to several factors: • character of the chaotic equations; • errors resulting from the measurements of the initial state of the atmosphere; • errors induced by the discretization of the atmosphere (horizontal and vertical dimensions of the mesh); • error related to the spatio-temporal iterative process; • problem related to the complication of modeling physical phenomena of the atmosphere like the convection. These various sources of error enforce us to try to find an efficient and reliable alternative to reduce the minimum possible errors and the biases issuing form the ARPEGE outputs.
In the absence of satellite data, we propose to build a forecasting model based on a statistical approach to improve the quality of existing models such as time series analysis approach that provides the methodology ARIMA.
In order to validate the proposed methodology, we will focus on the case of Lebanon in collaboration with meteorological service in order to access a vast database related to Beirut meteorological station. These data that have been recorded over several years will be used to test the reliability of the proposed model.

ARIMA Sequential
The data of daily temperature of Beirut city for a period of 11 years from 01/01/2006 to 31/12/2016 were taken from the meteorological service at Beirut Rafic Hariri International Airport (BRHIA). ARIMA model takes the temperature time series data as an input. The three mandatory p, d and q parameters that have to be selected, represent the order of the ARIMA model. p: is the number of autoregressive terms, d is the number of non-seasonal differences and q is the number of moving average model coefficients. ARIMA model is considered to be a promoter approach in weather forecasting. It is a great statistical technique for modeling time series, temperature and other meteorological variables.
1) Data collecting The meteorological service (Met-Service) of Beirut Rafic Hariri International Airport (BRHIA) has 20 climatological stations spread all over Lebanon. These stations are equipped with a multitude of sensors to measure several meteorological variables: temperature, humidity, precipitation, wind speed, wind direction and vapor pressure. Among these stations, 17 are automatic and autonomous weather observation stations, while 3 are manual observation stations that require the intervention of technicians to send data. The BRHIA climate department represents the data warehouse that collects all data from other meteorological stations via a Local Area Network (LAN); these data have been arc-hived as text format. At the end of the day, it connects to the other meteorological stations in order to fetch and save daily temperature report in text files.
2) Data Base Oracle Oracle database collects and organizes data from text files into tables; this mission was executed through PL/SQL procedural language packages and libraries. At the end of each day, a schedule job procedure connects to the data warehouse available in the climate department, opens each text temperature data file, reads values of temperature parameter to finally insert them into the database; these libraries also have the capability to generate daily CSV files.
3) ARIMA methodology The realization of the ARIMA proposed model has been achieved after the execution of a sequence of processes illustrated in Figure 2.
a) Data processing During this phase, null data is checked, in case there would be null data in the series, they would be replaced by the average values of the series itself related to a month where it has been found in a specific year. The data used in this article is: minimum daily temperature. b

) Seasonality verification
The result in Figure 3 shows that there is a seasonality in the series. This seasonality is well seen while the periodicity is 360 days. So the process of differentiation (Xt − X t−360 ) must be associated to remove the seasonality.   Figure 4 shows that there is no more seasonality in the series because the periodicity is not seen anymore in the series. d) Augmented Dickey Fuller (ADF) statistical test check if the given series is stationary in trend or not the ADF (Augmented Dickey Fuller) test is used [18].
Based on Table 1, we can consider that there is stationarity in trend (pvalues less than 0.05).
Choice of parameters p, d, q which satisfy the order of the ARIMA model Figure 5. e) Parameters estimation Choice of parameters p, d, q which satisfy the order of the ARIMA model: • p: lag value where the PACF curve meets the zero axis, the series always jumps from the Quenouille interval.
• q: based on Table 5 the value of the last lag where the ACF crosses very significantly the interval is 35, that means, at this point where q values is equal 35 the series remains within the Bartlett interval. The model MA(q) is considered to be our proposed model since there are fewer parameters to estimate than the AR(p). f) Residuals Check and Test After the selection of the ARIMA orders p = 0, d = 0 and q = 35, we cannot decide if they are the best orders parameter until we satisfy the two following criteria: • check the ACF and PACF of the residuals issuing from the ARIMA (p, d, q) whether they are confined in the confidence interval;   Figure 6, ACF and PACF of the residuals issuing from the ARIMA (0, 0, 35) are confined in the confidence interval, that means the first criteria is satisfied.  After the satisfaction of the criteria of ACF and PACF for the residuals, we proceed to check whether the residuals are white noise or not.
Ljung-Box statistical test should be applied to check if the residuals are considered as white noise [10]. Based on Table 2 that contains 35 values issued

ARIMA Parallel
The proposed model ARIMA sequence has the following order (0, 0, 35). The order of the model, in question, is defined by a Moving Average of high order q = 35 which poses a high complexity requires heavy resources of the computers to be able to execute this type of matrix computations, and a long execution time.
To face this problem, the proposed solution is constituted of three parts: PART I: Architecture proposed for system parallelism, Figure 7. PART II: Sub-models ARIMA coefficients calculation, Figure 7. PART III: RAM of my computer has been increased to reach 20 GB.

PART I: Architecture Proposed for System Parallelism
The proposed model is an ARIMA (0, 0, 35). The highest order 35 has shown complexity while creating the associated model coefficients. This diagram flow has been suggested as a solution for this complexity. First step, start by dividing the highest order 35 into 7 lists, each one consists of 5 orders, that mean dividing the model ARIMA (0, 0, 35) into 7 sub-models: "ARIMA (0, 0, 1..7), ARIMA (0, 0, 8..14), ARIMA (0, 0, 15..21), ARIMA (0, 0, 22..28), ARIMA (0, 0, 29..35)". This step was followed by creating 7 threads in Table 2. Ljung-Box statistical test is applied to confirm the result issued from Figure 6 that residues are considered as white noise.  order to map each sub-models into one of creating threads assigned to execute it; These created threads must be located in the Read Access Memory (RAM), for this reason a memory zone called Pool thread is allocated in the RAM which contains the threads. After associating each of the sub-models to the appropriate thread, distribute the job of each thread into a free core (not busy) implemented in the machine. The below algorithm represents the architecture proposed for system parallelism.

PART II: Sub-Models ARIMA Coefficient's Calculation
The second part of the diagram is dedicated to calculate the coefficients of each sub-models ARIMA and to join the calculated coefficients among them to get the final coefficients of the proposed model ARIMA (0, 0, 35); This process has been achieved based on Compute(ARIMA, parameter (p, d, q)) method. For example, in order to calculate the coefficients matrix of sub-model ARIMA (0, 0, 1..7), consider the following instructions: • arima order = (0, 0, 1..7) • model = ARIMA(series, order = arima order) • coefficientsCalculate = calculateCoeff (start params = loop i in range (1 to 7)) • matrixCoefficients = CoefficientsCalculate.fittedvalues • repeated to other submodels ARIMA.
First, the algorithm finds a free Core "C", then locks it after mapping a thread "t" to it. This process keeps repeating onto the other threads. When a thread finishes its job, it concatenates the generated coefficient vector with the other threads' coefficient vector and release the core "C" (unblock). After finishing the job for each thread and joining their coefficient vectors among them, we get the coefficients of the proposed ARIMA (0, 0, 35) model. This scenario is represented by the following algorithm section: /*execute and calculate the result of the function ARIMA(p, d q)*/ array coefficient ← compute(ARIMA, parameter_ARIMA_p _d_q) /*concatenate all results issuing from each thread after its individual job is done*/ matrix coefficient ← add to list(matrix coefficient, array coefficient); /*release the core resource after terminating its executed job*/ release thread(var thread); /*take another thread from the pool memory */ j ← j + 1; Finally this algorithm is applied on our case as follow: Begin p order ← 0; d order ← 0; q order ← 35; n lists of q orders ← 5; Coefficient vector ← Arima P arallel(p, d, q, n lists of q orders); End

Result
The values given by the numerical prediction model ARPEGE presented in Table 3 and Table 4 had been taken from the weather bulletin prepared by the forecasters in the meteorological service at BRHIA. In fact, ARPEGE archives maps and graphs related to the meteorological parameters back to 4 days passed a current date which is considered to be a critical time. After this critical time, maps and graphs are immediately removed from the storage. Forecasters, in the meteorological service, during the preparation of the daily weather bulletin, they check the graphs and maps of several meteorological parameters in order to write down the forecasted values given by ARPEGE on the weather bulletin. This process should be achieved before the deleting of the maps and graphs at the critical time, one of these parameters is the temperature Table 3 shows the results of accuracy issued from ARIMA and ARPEGE prediction models for the first five days of January 2017. The real values for the temperature for the first day of January 2017 are 10.5, whereas the value forecasted by ARIMA is 10.47 and ARPEGE is 7.56. By comparison, we deduced that International Journal of Intelligence Science the accuracy given by ARIMA is 99.9% which is better than the accuracy given by ARPEGE which is equal to 72%. For the second day of January 2017, we deduced that the accuracy given by ARIMA is 97.5% which is better than the accuracy given by ARPEGE which is equal to 65.1%. For the third day of January 2017, we deduced that the accuracy given by ARIMA is 97.2% which is better than the accuracy given by ARPEGE which is equal to 61.4%. For the fourth day of January 2017, we deduced that the accuracy given by ARIMA is 74.3% which is better than the accuracy given by ARPEGE which is equal to 59.7%. Finally for the fifth day of January 2017, we deduced that the accuracy given by ARIMA is 86.1% which is better than the accuracy given by ARPEGE which is equal to 80.6%. Table 4 shows the results of accuracy issued from ARIMA and ARPEGE prediction models for the first five days of August 2016. The real value for the temperature for the first day of August 2016 is 26.7, whereas the value forecasted by ARIMA is 26.49 and ARPEGE is 24.5. By comparison, we deduced that the accuracy given by ARIMA is 99.2% which is better than the accuracy given by ARPEGE which is equal to 91.7%. For the second day of August 2016, we deduced that the accuracy given by ARIMA is 98.9% which is better than the accuracy given by ARPEGE which is equal to 90.5%. For the third day of August 2016, we deduced that the accuracy given by ARIMA is 98.7% which is better than the accuracy given by ARPEGE which is equal to 91.7%. For the fourth day of August 2016, we deduced that the accuracy given by ARIMA is 97.8% which is better than the accuracy given by ARPEGE which is equal to 88.3%. Finally for the fifth day of January 2017, we deduced that the accuracy given by ARIMA is 96.81% which is superior to the accuracy given by ARPEGE which is equal to 85.4%. Table 5 shows a comparison between the proposed prediction models ARIMA sequence, ARIMA parallel and ARPEGE. This comparison includes consumption hardware resource in addition to the operating system that supports them. We started by the execution time parameters, this

Conclusion
The temperature is considered an important parameter to forecast in the meteorological service in Lebanon since it influences many sectors in Lebanon such as economy, tourism, agriculture, etc. According to the results, the accuracy of predictions made for temperature by ARIMA model is better than that of the ARPEGE compared to it in two seasons: summer 2016 represented by the August month and winter season which is represented by January 2017, the Ljung-Box statistical test proves the power of the accuracy and assures that the parameters p, d and q fit well to the proposed ARIMA model. This was conducted through the test of the residuals that are considered as white noise. Furthermore, considering the hardware resources consumption, the result also shows that the ARIMA model takes 10 hours as execution time which is better than ARPEGE that takes 102 hours. Moreover, considering the consumption resource, ARIMA requires 20 GB space from the Read Access Memory (RAM) which is much better than the reservation required, made by ARPEGE during the execution which is equal to 8.32 TB (Terra Bytes). Finally, it is essential to mention that since this article shows some advantages of ARIMA on ARPEGE, the ARPEGE is considered to be one of the important numerical models that are widely adopted by many Arab's and Europ's meteorological departments. Also ARPEGE model may show more accuracy in the prediction in this department, but the type of geographical topography in Lebanon makes the mission of accuracy given by ARPEGE and many other numerical weather prediction models (GFS, ECMWF, etc.) a very complicated one.