Application of Time Series Analysis to Annual Rainfall Values in Debre Markos Town, Ethiopia

For many years planning and management of water resources involved modeling and simulation of temporally sequenced and stochastic hydrologic events. Rainfall process is one of such hydrologic events which calls for time series analysis to better understand interesting features contained in it. Many statistics-based methods are available to simulate and predict such a kind of time series. Autoregressive (AR), moving average (MA), autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models are among those methods. In this study a search was conducted to identify and examine a capable stochastic model for annual rainfall series (over the period 1954-2015) of Debre Markos town, Ethiopia. For the historical series, normality and stationarity tests were conducted to check if the time series was from a normally distributed and stationary process. Shapiro-Wilk (SW), Anderson-Darling (AD) and Kolmogorov-Smirnov (KS) tests were among the normality tests conducted whereas, Augmented Dickey-Fuller (ADF), Phillips-Perron (PP) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests were among the stationarity tests. Based on the test results, logarithmic transformation and first order differencing were performed to bring the original series to a normal and stationary series. Results of model fitting showed that three models namely, AR (2), MA (1) and ARMA (2,1) were capable in describing the annual rainfall series. A diagnostic check was performed on model residuals and ARMA (2,1) was found to be the best model among the candidates. Furthermore, three information criteria: Akaike Information Criterion (AIC), the corrected Akaike Information Criterion (AICc) and Bayesian Information Criterion (BIC) were used to select the best model. In this regard, too, the least information discrepancy between the underlying process and the fitted model was obtained from ARMA (2,1) model. Hence, this model was considered as a better representative of the annual rainfall values and was used to predict five years ahead values. The mean absolute percentage error (MAPE) of the prediction was found to be less than 10%. Thus, ARMA (2,1) model could be used for forecasting and simulation of annual rainfall for How to cite this paper: Abebe, S.A. (2018) Application of Time Series Analysis to Annual Rainfall Values in Debre Markos Town, Ethiopia. Computational Water, Energy, and Environmental Engineering, 7, 81-94. https://doi.org/10.4236/cweee.2018.73005 Received: June 30, 2017 Accepted: June 3, 2018 Published: June 6, 2018 Copyright © 2018 by author and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access


Introduction
Understanding historical and current behavior of a climatic variable such as rainfall is a pre-requisite to future development and wise use of water resources.
The importance of such a kind of understanding is apparent especially within the context of climate change and increase in water demand [1]. Historical rainfall record in its daily, monthly, annual or other forms has been a valuable input to different hydrologic analyses based on which important development decisions were made. In recent years, due to the effect of human interventions, rainfall records in general are exhibiting significant nonstationarity to the level that they cannot be analyzed using traditional hydrologic analysis techniques such as frequency analysis [2]. Hence a suitable statistical method which can take care of nonstationarities is required for a proper understanding and modeling of recent rainfall series.
Available records of rainfall are time sequenced realizations of the huge and unknown rainfall population. Future values of these sequences cannot be predicted certainly. Usually rainfall observations fluctuate from time to time in a random fashion. In order to provide a statistical setting for describing the characteristics of such phenomenon, a stochastic time series analysis is employed. In this kind of analysis, temporal characteristics of the rainfall can be examined and a prediction model can be synthesized.
According to McCuen [2] five components may be present in a given time series data. Not all but some of them may be reflected in any given time series. Three of the five components are classified as systematic variations and the rest as non-systematic variations. Secular, periodic and cyclical trends are components under systematic variations. Secular trends are tendencies to rise or fall constantly for prolonged time in a linear or non-linear form. On the other hand, periodic and cyclic trends are tendencies which have a repetitive nature after completion of one cycle. Episodic and random variations are those components under non-systematic variations. Abrupt changes in the time series are defined as episodic variations. Some of the driving forces for such kinds of changes include extreme meteorological events and changes in location of a recording gage. On the other hand, random variations arise from factors which are uncontrolled or unmeasured.
To systematically deal with stochastic rainfall time series data, the widely used statistical approaches formulated by Box and Jenkins can be implemented [3]. The central idea in this approach is that adjacent observations are dependent.
Unlike the classical regression, where observations or variables are assumed to be independent, in stochastic time series analysis observations are correlated. Hence the Box and Jenkins approach, in general, applies autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) models to deal with these dependences.
The major objective set for this study is to identify, fit and assess a capable model which can simulate the stochastic characteristics of annual rainfall in Debre Markos town, Ethiopia.
Numerous studies have been conducted on rainfall time series at different temporal scales. Some of them are devoted to the detection while others on the analysis and modeling of nonstationarites present in the rainfall records. Oguntunde et al. [4] investigated hydrological variability and trends in the long-term Wang et al. [5] used a seasonal autoregressive moving average (SARIMA) model to simulate and forecast a seasonal precipitation series of Shouguang city, China. In their study they've identified and fitted the data to four candidate models namely SARIMA (2, 0, 2) (1, 1, 1) 12 , SARIMA (2, 1) (1, 1, 1) 12 , SARIMA (1, 1) (1, 1, 1) 12 and MA (12). After comparing the models based on available information criteria, they've argued that SARIMA (2, 0, 2) (1, 1, 1) 12 is the better one and used it for forecasting.
Saada [6]  (1), ARMA (1, 1) and ARMA (2, 1) were capable of replicating the long term statistical values at Surat Obedia. In contrast, the models were unable to capture the correlation structure as well as the long-term statistics at Malaki.

Study Area
Debre Markos Town is one of historical medium towns of Ethiopia. It is located 300 km North West of the capital Addis Ababa and 265 km South East of the Amhara National Regional state's capital Bahirdar (Figure 1. 10˚20'N and 37˚43'E). The town covers an area of around 60 km 2 . The climatic condition is generally humid, with mean annual temperature of 16˚C and rainfall of 1308 mm. In this highly expanding town, the demand for water is expected to increase in the near future. Apart from the exploitation of the rich groundwater, it's necessary to manage and use the surface water sources too. To facilitate the surface as well as the groundwater management, the historical and future behavior of the rainfall process must be understood. From this standpoint this study is Figure 1. Location of Debre Markos town [7].
conceived to aid the understanding of Debre Markos rainfall process at annual scale.

ARIMA Models
Often interesting features of a time series cannot be explained by classical regression [8]. The common thing in these three models is that they all acknowledge the presence of correlation and are applicable on a stationary time series. If nonstationarity is an issue, as in most practically encountered time series, another sto-

Autoregressive Models
The central idea of autoregressive models is that the current value of a time series is affected by one or more past values of the same series. The extent to which past values affect the current value can be examined using sample autocorrelation function (ACF). The general equation of an autoregressive model of order p, AR(p), is given as: where t x is current value of the series, 1 2 , are past values of the series and t w is white noise term which is a collection of uncorrelated random variables with mean of 0 and is defined as: Particular equations of the various order p can be derived from the general Equation (1). For example; AR (1) model can be given as:

. Moving Average Models
An alternative to the autoregressive model, the moving average model assumes the current value as a linear combination of white noises, t w . The general equation of moving average model of order q, MA(q), is given as:

Autoregressive Moving Average Models
For a stationary process, a mixed type of model i.e. autoregressive moving average (ARMA) can be used. This model is a linear combination of AR(p) and MA(q) models and its general equation for orders p and q, ARMA (p,q), is given as: with 0 p ∅ ≠ and 0 q θ ≠ . The concise form of ARMA (p,q) model is given as: 3) above one can notice that when p = 0, the model becomes a moving average model of order q, MA(q), and when q = 0, the model becomes autoregressive model of order p, AR(p).

Autoregressive Integrated Moving Average Models
Yet another model to be used especially when nonstationarity is apparent in the time series is autoregressive integrated moving average (ARIMA) model. In this model the time series is made stationary by applying a certain order of diffe-rencing, d. The general equation of this model for orders p, d and q, ARIMA (p, d, q) is given as: The model can also be written as: (4) we can notice that if d = 0, the resulting model is pure ARMA (p,q). Hence ARIMA (p,d,q) can be considered as a general case of ARMA (p,q), MA (q) and AR (p) models.
Often the performance of stochastic models is assessed using residual analyses [9]. Residuals must be in accordance with model assumptions. From the presented models above one or more of them could potentially describe the time series at hand. If statistically significant and competing two or more models are available, information criteria can be used to select the better one. Information criterion uses information discrepancy between a model and the underlying process as a measure to evaluate goodness of the model [11]. For this purpose, three information criteria can be used for model comparison.
The Akaike information criterion (AIC) proposed by Akaike in 1973 evaluates a model whose parameters are estimated by maximum likelihood method.

Another popular evaluation criterion which is based on Bayesian probability is the Bayesian information criterion (BIC) which was proposed by Schwarz in
1978. BIC can be applied to models whose parameters are estimated by the maximum likelihood method. The variant of AIC for which the bias of the loglikelihood is corrected (AICc) can also be used for model comparison. Details on these three criteria are provided in [11].

Materials Used
In this study the historical annual rainfall values of Debre Markos town were used. 62 years annual values were collected having time range from 1954 to 2015.
Among these values 57 of them were used for model building and the rest five were used for prediction evaluation. For data processing and modeling activities a free software environment known as R (version 3.4.3) was used. Along with base R packages, additional packages called astsa and forecast are also used.
Apart form the extensive help () function, the focused R notes [12] and the easy way instruction in Kabacoff [13] were valuable resources to get familiar with the R program. The Detailed treatments on application of R in time series analysis S. A. Abebe Computational Water, Energy, and Environmental Engineering presented in Cowpert wait and Metcalfe [14], Cryer and Chan [15] were helpful too.

Procedure Followed
The following steps were implemented to build a suitable stochastic model for Debre Markos town annual rainfall series. 1) Input data examination-in this step the raw data was imported into R and analyzed to examine if there were outliers and whether the series was stationary or not.
2) Model identification-once the series was confirmed to be stationary, a tentative model of certain order was identified based on sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF). At this stage more than one equally competing models were identified.
3) Parameter estimation-at this stage sarima() function from astsa package of R was used to fit the stationary data to the tentatively identified models in step 2 above. Parameter values were computed using maximum likelihood estimation method. If a model fits the data well, t-values should be greater than 2 in magnitude and p-values should be less than the significance level [16]. In this study a significance level of 5% was maintained for all hypothesis tests.  To stabilize the variance in the data, a logarithmic transformation was used.

Input Data Examination
Normality tests along with stationarity tests have been conducted to check if the transformed data meets the requirement of normality and stationarity for the upcoming stochastic analyses (Table 1). Normality tests known as Shapiro-Wilk (SW), Anderson-Darling (AD) and Kolmogorov-Smirnov (KS) have been applied. All of these normality tests assess the null hypothesis that the sample series was extracted from a normally distributed population (i.e. Ho: sample is normally distributed). Among the stationarity tests, the Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) tests assess the null hypothesis that the sample series has a unit root (i.e. Ho: sample is nonstationary) while Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test evaluates the null hypothesis that the sample series is stationary (i.e. Ho: sample is stationary). A detailed comparison and applicability of the above tests is presented in Ghasemi and Zahediasl [17] and Imam [18]. According to the tests performed, first the series was normal but nonstationary. A first order differencing was then applied to bring the series to a

Model Identification
Based on sample ACF/PACF plots for the stationary series, tentative model types and model orders were identified. Figure 3 shows sample ACF and PACF of the stationary series. From the plots a significant correlation at lag 2 can be observed. For the lags beyond lag 2 the correlation quickly dies out. This is a signal of confirmation that the differenced series is stationarity. Inspecting the plots closely, it feels that the ACF is cutting off at lag 1 and PACF is tailing off. This would suggest that the stationary series follows MA (1) or ARIMA (0,1,1) process. Due to the use of a limited sample size, it is impossible to tell whether the ACF/PACF is exactly cutting off or tailing off at certain lag. From the plots, it can also be suggested as PACF is cutting off at lag 2 and ACF is tailing off. Hence the annual rainfall process can be argued to follow AR (2) or ARIMA (2,1,0) process. Another possible candidate could be the mixed type ARMA (2,1) or ARIMA (2,1,1). As a preliminary analysis all the three models were fitted and diagnosed.

Model Parameter Estimation
The candidate models were fitted to the stationary data using sarima() function of astsa package. The function fits an ARIMA (p,d,q) model to the data and S. A. Abebe returns parameter estimates, standard errors, AIC, AICc, BIC and diagnostics using maximum likelihood estimation. Results of the fit are displayed in Table 2.
As it can be seen from the table t-values were greater than 2 in magnitude and p-values were less than 0.05. Hence at 5% significance level it can be argued that all the models fitted the series well.

Model Diagnostics and Selection
Model diagnostics mainly focuses on analysis of model residuals as a means of model verification and choice. If a model fits the data well, in the conventional regression analysis sense, the residuals behave in such a way that they are iid sequence with mean zero and finite variance, 2 w σ . In other words, the residuals become white noises. But in time series analysis we rarely encounter white noise residuals. Therefore, we should relax the iid requirement [8].   Figure 4. Diagnostics of the residuals from AR (2) fit. each model. As it can be seen from Table 3, the least information discrepancy (AIC and AICc) was obtained from ARMA (2,1) model. The BIC prefers the simpler MA (1) model. As noted in Shumway and Stoffer [8], the BIC often prefers smaller order models and it's not unreasonable to retain the MA (1) as a qualifying model. In this study, however, ARMA (2,1) was selected as the preferable model to represent the annual rainfall process at Debre Markos town.

Prediction
Once the better model was selected, five-year ahead prediction was conducted.
For this purpose, sarima.for() function of astsa package was used. This function produced predicted values based on the chosen ARMA model. Figure 7 shows the resulting prediction plot along with one and two standard error prediction bounds. The mean absolute percentage error (MAPE) of the forecast was 9.0% (Table 4). Hence the model can be considered as a better predictor.

Conclusions
In this study annual rainfall time series of Debre Markos town, Ethiopia, was investigated. A logical procedure was followed in search for a better stochastic model that could better explain interesting features contained in the annual series. Among three statistically competent ARIMA models, ARMA (2,1) model was selected as the better model. Hence this model can be used to aid the understanding of Debre Markos rainfall process at annual scale. Furthermore, the model can be used as a potential alternative for prediction of annual rainfall values. Finally, as a recommendation, other stochastic models should be investigated to see if there are other models that can also preserve long term statistical beha-