Combination of WRF Model and LSTM Network for Solar Radiation Forecasting—Timor Leste Case Study

A study of a combination of Weather Research and Forecasting (WRF) model and Long Short Term Memory (LSTM) network for location in Dili Timor Leste is introduced in this paper. One calendar year’s results of solar radiation from January to December 2014 are used as input data to estimate future forecasting of solar radiation using the LSTM network for three months period. The WRF model version 3.9.1 is used to simulate one year’s solar radiation in horizontal resolution low scale for nesting domain 1 × 1 km. It is done by applying 6-hourly interval 1˚ × 1˚ NCEP FNL analysis data used as Global Forecast System (GFS). LSTM network is applied for forecasting in numerous learning problems for solar radiation forecasting. LSTM network uses two-layer LSTM architecture of 512 hidden neurons coupled with a dense output layer with linear as the model activation to predict with time steps are configured to 50 and the number of features is 1. The maximum epoch is set to 325 with batch size 300 and the validation split is 0.09. The results demonstrate that the combination of these two methods can successfully predict solar radiation where four error metrics of mean bias error (MBE), root mean square error (RMSE), normalized MBE (nMBE), and normalized RMSE (nRMSE) perform small error distribution and percentage in three months prediction where the error percentage is obtained below the 20% for nMBE and nRMSE. Meanwhile, the error distribution of RMSE is obtained below 200 W/m 2 and maximum bias error is 0.07. Finally, the values of MBE, RMSE, nMBE, and nRMSE conclude that the good performance of the combination of two methods in this study can be applied to simulate any other weather variable for local necessary.


Introduction
Nowadays, weather forecasting has been a very important process to ensure the running of several important human activities such as in renewable energy systems [1]. However, weather forecasting using some traditional techniques becomes useless and ineffective due to the impact of climate changes [2]. Some countries which have flash flood are not possible for predictions in such weather conditions with convectional of forecasting systems because the systems are used for the prediction for large regions [3]. The forecasting of solar radiation plays an important part in the meteorological area. Many methods have been developed to estimate solar radiation which involves correlations between solar radiation and other measured meteorological variables. However, in many cases of study, the information about solar radiation is not available with a very limited number of meteorological stations for a location of interest [4]. Applying a combination of numerical weather prediction (NWP) and time series method for forecasting is a promising approach for the modeling variation of solar radiation.
In recent years, modeling of solar radiation has been used in many countries with different climates by applying Machine Learning based on Artificial Neural Network (ANN). Many advanced countries such as Japan, the UK, the USA, China, India, and Spain are applying ANN in the modeling of solar radiation based on their location and different climates [5]. A requirement of large input data that must be connected to the target variable is one of the challenges for the machine learning method, but due to cost and maintenance including site, the important data may not be available. The successful planning of construction for the renewable energy project is depending on the accuracy of solar radiation prediction. In addition, many architects in the field and agriculturists require the accuracy of solar radiation for farming purposes [6].
Some cases of studies show accurate of the solar radiation data that require some combination of input parameters such as coordinate of a location (latitude and longitude), hourly of sunshine, maximum and minimum ambient temperature, albedo, aerosol optical depth, cloudiness, evaporation, precipitation, and relative humidity for prediction of solar radiation for several weather stations, but as mentioned before, due to costing a lot for a long-term record of solar radiation cause it is limit for a specific location [7]. Another important issue is the duration of the study for solar radiation prediction. The period of tests must be longer than one day particularly for the cloudy days and rainy days to stabilize uncertainty errors from weather forecasting. Mean absolute error (MAE), mean bias error (MBE), root mean square error (RMSE), and mean absolute error percentage (MAPE) or corresponding normalized errors such as nMAE, nMBE, nRMSE, and nMAPE are typically asses to estimate the forecast accuracy [8].
The implementation of ANN has been successfully applied in a variety of areas as presented in several studies. Vakili et al. [9] used the ANN model for daily global solar radiation prediction. Their study used several input parameters such as relative humidity, wind speed, and daily temperature for one year of Te-hran in Iran. They used three types of ANN models for predicting the daily global solar radiation such as Multilayer Perceptron (MLP), Generalized Regression NN, (GRNN), and Radial Basis NN (RBNN). Some error metrics such as root mean square error (RMSE), mean absolute error (MBE), and the absolute fraction of variance (R 2 ) are used to evaluate the accuracy and efficiency of the models. Their results showed that MLP and RBNN models were better accuracies than the GRNN. Yadav and Changel [10] presented their study that the performance of the accuracy results of ANN models was mostly dependent on the input parameters. The focused on the estimation of solar radiation for the Eastern Mediterranean Region of Turkey by using the ANN model based on learning algorithms and the number of hidden neurons to obtain and optimize the efficient estimation of the prediction performance. Their results showed that ANN predicts more accurate solar radiation comparing to conventional methods and the ANN model is found to be dependent on configuration architecture, an algorithm of training, and the combination of the parameter.
The objective of this study is to combine the Weather Research and Forecasting (hereinafter WRF) model and the Deep Learning method using Long Short-Term Memory (hereinafter LSTM) for future prediction. The results of one calendar year from January to December 2014 from WRF simulation were used as input data in LSTM to run a future prediction of solar radiation for location in Dili, Timor Leste. This dataset divided into training datasets (81%) and testing datasets (19%). In this study, the three months observation data obtained from a weather station in Dili were used for comparison purposes.
The structure of this work is organized as follows: Section 2 describes the study domain, evaluation of observation data, and sources. Section 3 presents the methodology used for solving problems such as the WRF model including machine learning methods using Long-Short Term Memory, and four error analysis metrics. In Section 4, the results of the simulations are presented. Particularly, three months of daily solar radiation forecasting and four the error metrics analysis data are shown in this section. Section 5 concludes the work of the paper.

Study Domain, Evaluation Observation, and Sources
Data on weather forecast with 2160 hours from January to March 2015 were collected at one station in Hera [11] (lat: 8˚33'03.9''S, long: 125˚39'33.7''E) which located about 12.4 km in the east of Dili, Timor Leste as shown in Figure 1. Located in the centered of Faculty of Engineering, Science, and Technology in Hera campus, Weather station of type Vaisala WXT530 provides hourly solar radiation which will be used for comparison purposes with the result data of combination between WRF model and LSTM network for further analysis. This weather station provides wind speed, wind direction, temperature including solar radiation. However, because the data generated from the weather station is very limit, only three months of solar radiation data from 1st January to 31st March 2015 were used in this study for local necessary forecasting. The objective Figure 1. Plotting of land cover study area (See Ref. [11]). of getting external information from weather forecasting services is to obtain solar radiation for the application of energy management. The Global Forecasting System (GFS) is the most used data source for a weather forecast. It provides data of weather and demonstrates as a useful tool for various weather variables including solar radiation and solar farm operations. Six-hourly interval 1˚ × 1˚ NCEP FNL analysis data via a web server (https://rda.ucar.edu/datasets/ds083.2/) used as Global Forecast System (GFS) for initial data of the simulation for one calendar year from 1st January to 31st December 2014.

WRF Model
The Weather Research and Forecasting (WRF) [12] with Advance Research WRF (ARW) version 3.9.1 was used to simulate solar radiation. WRF-ARW is an open-source mesoscale numerical weather prediction is developed and contributed from a large user community such as National Oceanic and Atmospheric Administration (NOAA), the National Centers for Environmental Prediction (NCEP), and the National Center for Atmospheric Research (NCAR). WRF applies the dynamic and thermodynamic equations for the atmosphere simulation. In addition, WRF executes and runs some physical schemes that simulate phenomena in which cannot be done by the dynamical solver. One big advantage of using WRF is implementations for each physical scheme for the large choice that allows the users to configure the model based on their necessary.
In this study, the performance of WRF was evaluated in three different configurations. Three two-way nestings with a horizontal spatial resolution of 9, 3, and 1 km as illustrated in Figure 2 with domain 1 is composed of 86 × 68 cells, domain 2 is 88 × 88 cells, and domain 3 is composed of 100 × 100 cells. WRF Single-Moment 5-Class scheme was used for the microphysics. RRTMG is a new scheme of Rapid Radiative Transfer Model was applied for longwave (LW) and shortwave (SW) radiation [13]. This study used the Monin-Obukhov MM5 theory for the surface layer [14]. Noah LSM was used for land surface [15]. Planetary boundary condition used Yonsei University scheme [16]. Mercator was used as a map projection. However, only data obtaining from domain 03 was used as input data in the LSTM network for comparison purposes with the observed data for further analysis of forecasting. NCL (NCAR Command Language) version 6.5.0 was to plot its grid point and variables [17].

LSTM Network
LSTM network as a branch of the RNN model is suitable for forecasting various learning problems particularly for solar radiation prediction [18]. The ability and flexibility of LSTM architecture to control and manipulate several parameters of the time series are a great benefit in time series forecasting, where we can apply these inputs to multivariate data for future prediction. The structure of LSTM as part of Recurrent Neural Network (RNN) consists of three layers such as an input layer, a hidden layer, and an output layer as shown in Figure 3. The LSTM network is mostly applied using the Keras package for training and testing datasets [19] [20] [21] [22]. This work uses a moving-forward window technique to run prediction in the next time step [23]. The selection of the number of hidden layers, number of neurons, number of epochs, and batch size play an important role in the implementation of Long-Short Term Memory. So, in this study, these parameters are selected based on trial and error with a range of 1 -512 neurons, 1 -300 batch size, and a number of the epoch with a range of 1 -325 were evaluated until it converged into close results with the observed data. The input data uses the min-max scaler technique for normalizing (−1, 1) before running the algorithm. Table 1 shows more configuration about the LSTM network using two-layered LSTM architecture of 512 hidden neurons coupled with a dense output layer with linear as the model activation to predict with time steps 50 and the number of features is 1. The maximum epoch was set to 325 with batch size 300, and the validation split is 0.09.   Figure 3 showed the RNN model which usually use for the time series forecasting. The input layer as the first layer which has weight and each layer will receive weight from the previous layer and use activation function for the hidden layer and linear function for the output layer. In the previous time (t − 1), a delay is happened between the input layer and the hidden layer and can be used in the current time (t). Parameter x(t) and y(t) are the input and output of time series. RNN network can be described by the equations as shown below; where W 0H , W 1H , and W HH are the three connection weights, h(t) is a set of values from the summarize of all information in the past which is necessary to describe the future.

Evaluation of Solar Radiation
Two error metrics such as root mean square error (RMSE) and mean bias error (MBE) from David et al. [24] and expressed in W/m 2 were applied in this paper to evaluate accuracy between observed and simulated data. Meanwhile, normalized MBE (nMBE) and normalized RMSE (nRMSE) expressed in % [25] [26] were used for normalizing solar radiation in the considered period. These four error metrics are defined as below; ( ) where n represents the number of the time step, pred represents data of the combination of WRF and LSTM algorithm, and obs is Dili weather station observed data. Rmax and Rmin represent the maximum and minimum values of solar radiation from the simulated and observation data. All error metrics validated using hourly data for the considered period where MBE defines if the model is producing underestimation (MBE < 0) or overestimation (MBE > 0),

Results
This section presented the results of three months prediction comparison between simulated and observation data including the error analysis of four metrics for solar radiation forecasting. Since there is a lack of information from the local weather station regarding cloud cover, aerosol optical depth (AOD), water vapor, cloud water path, and cloud effective radius, only 2160 hours of solar radiation were used to ensure the experimental comparability and accuracy.  When sunlight passes through the atmosphere, solar irradiance would reduce caused by damping processes such as absorption of water vapor, the existence of cloudy conditions, and aerosol. In addition, humid areas vary time and location may also decrease solar irradiance. In this study, the maximum solar radiation prediction was obtained from the LSTM method around 1002 W/m 2 , 991 W/m 2 , and 992 W/m 2 in January, February, and March. Figures 4(a)-(c) show hourly solar radiation from the LSTM almost reach above 600 w/m 2 . Meanwhile, some hours of solar radiation were obtained under 600 W/m 2 where it was supposed to be rainy days. Table 2 shows the result of the values of root mean square error (RMSE), mean bias error (MBE), normalized MBE (nMBE), and normalized RMSE error (nRMSE). The RMSE showed error value reached 203 W/m 2 in January, 177 W/m 2 in February, and 161 W/m 2 in March. Meanwhile, the MBE metrics of these three months showed error values reached above 0 estimations where it indicated the overestimation of the combination from both methods as it shows the positive values of prediction. Meanwhile, the nMBE showed a small percentage error decreasing from 7.38% to 0.65%. In addition, the nRMSE showed also a small percentage error decreasing from 20.09% to 16.18%. The percentage and distribution error is imperative to detect the performance of forecasting skill. Hence the MBE, RMSE, nMBE, and nRMSE are used to evaluate the performance of the model. In the case of these four error metrics, values continue lower from January, February, and then March indicating good performance with the LSTM model.
Based on the decision surface, it can be analyzed that every month of the year have always the maximum effect on solar radiation forecasting as they may be caused by the effect of the top of atmosphere solar insolation, ambient maximum and minimum temperature, and ambient pressure. Moreover, the influence of the location for the latitude and longitude may cause also to surface solar radiation. These values of four error metrics demonstrate that the algorithm of LSTM can successfully increase the performance of the solar radiation forecasting. Obtaining good accuracy for forecasting of solar radiation using LSTM can be done by adjusting the number of epochs, number of batch size, number of neurons, and validation of split. In addition, the performance of LSTM can be also influenced by the input variables over a range of frequencies such as hourly and daily data. All these parameters are done by a large number of trial and error to perform the best results which close to the observation data. Overall, the performance of LSTM for solar radiation forecasting showed accuracy and agreement. The only main problem in this present study is the lack of data from the weather station in the year 2015 which can be used for comparison purposes with the LSTM method.

Conclusions
In this study, the evaluation of the reliability of three months of solar radiation provided by a combination of the WRF model and the LSTM method comparing with the observation data was conducted in Dili, Timor Leste. 1 km spatial horizontal resolution estimation with an hourly time resolution from the WRF model was used as input data in the LSTM method to predict three months of solar radiation at the beginning of the year 2015. The 1 deg × 1 deg FNL analysis data obtaining free from the NCEP website were used to run the WRF model for solar radiation simulation. Since there is a lack of information on other variables from the observed data over 3 months in Dili, applying the solar radiation variable is one option to analyze the performance of combination from both methods for future prediction. The three months observed data at the beginning of 2015 are valuable points in understanding solar radiation forecasting for a long-term period. However, some values of weather station data were found zero at the beginning of the year caused the four error metrics to become higher. Meanwhile, the understanding of numerical weather prediction, input data, and deep learning could help to analyze the performance of forecasting.
Three important variables (solar direct, solar diffuse, and cos zenith) were carried out to evaluate solar radiation on the ground surface. The first analysis showed that the LSTM method performed overestimated solar radiation for MBE in January, February, and March about 0.07, 0.04, and 0.006. The RMSE, nMBE, and nRMSE also showed that the decreasing value in the performance of these three months' prediction. A lower error for solar radiation forecasting in two metrics is not always indicating to lower forecasting of the solar PV system, however, lower forecasting error mostly reaches a higher accuracy for the solar radiation forecasting itself.
The main contribution of this paper is the performance of combining two very well-developed powerful models for local solar radiation forecasting, the WRF model and the LSTM network, respectively. Even though only single location data and a limited number of forecasting data are presented, it's giving significant understanding for PV set up as an initial measurement in solar energy modeling. The proposal of this study is combining physical method and learning model performs a best of breed approach to achieve a favorable and valuable to better appraise of the accuracy corresponding forecasts. The conclusion of this study is applying a combination of these two powerful models, WRF and LSTM respectively, for solar radiation prediction in Dili will be one solution to deal with other variables for future prediction. is a python file that contains code to run the future prediction of solar radiation in the LSTM method.
Value of the data 1) The data of "Appendix.xlsx" might be used or needed to compare by other researchers with their forecasting data.
2) "LSTM.py" file might be used by other researchers to perform their forecasting of any variables.