Thermal Energy Collection Forecasting Based on Soft Computing Techniques for Solar Heat Energy Utilization System

In recent years, introduction of alternative energy sources such as solar energy is expected. Solar heat energy utilization systems are rapidly gaining acceptance as one of the best solutions to be an alternative energy source. However, thermal energy collection is influenced by solar radiation and weather conditions. In order to control a solar heat energy utilization system as accurate as possible, it requires method of solar radiation estimation. This paper proposes the forecast technique of a thermal energy collection of solar heat energy utilization system based on solar radiation forecasting at one-day-ahead 24-hour thermal energy collection by using three different NN models. The proposed technique with application of NN is trained by weather data based on tree-based model, and tested according to forecast day. Since tree-based-model classifies a meteorological data exactly, NN will train a solar radiation with smoothly. The validity of the proposed technique is confirmed by computer simulations by use of actual meteorological data.


Introduction
Solar heat energy utilization systems and photovoltaic (PV) systems are rapidly gaining acceptance as one of the best solutions to be an alternative energy source in Japan especially.Since solar radiation is not constant, these systems are influenced by solar radiation and weather conditions.Using storage battery is one feasible measure to stabilize power output of PV systems.However, it requires additional costs.And when we use the storage battery, loss of electric power due to power conversion will occur as a result.In hybrid power systems, solar radiation forecasting is an important tool for utilizing the hybrid power systems with storage battery, PV system, solar heat energy utilization, solar cells, etc. Prediction for the state of the storage battery is one of solution.But there are other approaches.For example, since the state of charge for battery depends on other electric power resource, the amount of energy stored in the battery is easily decided by forecast data such as wind power and PV power output.These decisions are beneficial for effective operation of hybrid power systems and consequently, their profitability.From the point of view of reducing the cost by thermal demand, there should be an estimation of thermal energy collection of solar heat energy utilization system as accurately as possible.Therefore, it is necessary to develop more accurate method of solar radiation prediction.Although the technique to forecast the generating thermal energy collection based on weather prediction data is regarded as an effective method, the implementation of these techniques results in a fragile technique.Because meteorological agencies or weather services will provide prediction data, which are mostly gathered over a wide area by weather report.Therefore it becomes rather difficult to determine the hourly data at the installation site of solar heat energy utilization.To overcome these problems, we need a forecasting technique, which to be inexpensive and easy-touse.Application of neural network (NN) is known as a convenient technique for forecasting [1][2][3].It is possible to forecast solar radiation with only meteorological data.In recent years, many researchers report as for the solar energy, e.g., analytical solar radiation models [4][5], estimating method [6][7][8][9][10], and daily forecast methods [11][12][13].Apart from these studies, this paper proposes the forecast technique of one-day-ahead hourly thermal energy collection forecasting of solar heat energy utilization system based on solar radiation forecasting by using three different NN models.Selected models are feedforward neural network (FFNN), radial basis function neural network (RBFNN), and recurrent neural network (RNN).RBFNN is chosen for its structural simplicity and universal approximation property [14,15].Since RNN is known as a good tool for time-series data forecasting [16,17], RNN is chosen in this paper.A great deal of effort has been made on solar radiation forecasting method by using NN in any other paper [3,14].Nevertheless, it is important how to select the training data according to forecast day.Since a solar radiation fluctuates depending on weather conditions, training process of NN tends to be unstable.The proposed technique for application of NN is trained by weather data based on tree-based model, and tested according to forecast day.Random forest (RF) is used as the pattern classifier in this paper.RF is a combination of tree-based model such that each tree depends on the values of a random vector sampled independently (see e.g.[18]).Since the RF classifies the meteorological data exactly, NN will train the solar radiation with smoothly.The thermal energy collection of solar heat energy utilization system is calculated by the forecasted solar radiation data.The validity of the proposed method is confirmed by comparing the prediction abilities of above mentioned technique on the computer simulations at one-day-ahead 24-hour thermal energy collection forecasting for solar heat utilization system.
This paper is organized as follows.Section 2 describes the application models.The proposed methodology and input data for one-day-ahead electricity thermal energy collection forecasting are included in Section 3. The simulation results are presented in Section 4. Section 5 concludes this paper.

Application Model
The concept of the solar radiation forecast technique is shown in Figure 1.In this paper, on the assumption that 2005 is the forecast period of the solar radiation forecasting, RF and NN are constructed by making use of the meteorological data of from 2003 to 2004 year.FFNN, RBFN and RNN are constructed respectively in regard to the application of NN.Concerning the decision of the final output of NN, it utilized the mean value of those three outputs of NN models as the last output of NN.As shown in Figure 1, RF is constructed making use of the meteorological data in 2003.This RF classifies the level of the amount of solar radiation from each meteorological data, and this RF is used for in 2004 and 2005.RF and each NN are explained in next section.

Tree-Based Model and Random Forest
Tree-based model is the one method of nonlinear regression analysis and discrimination analysis, and is called "regression tree" on the regression problem.On the other hand, in the classification problem, it's called "classification tree" or "decision tree".As shown in Figure 2 in order to divide a dataset of node into 2 node groups, a condition of largest decreased error ΔR is determined as the branch rule to make a three-based model.ΔR is represented by where, s is dividing condition, t is the number of node, t L(R) is left(right) node, R(.) is error at node t i .An error R(t i ) at node t i is represented by where, N ti is the number of data at node t i , y k is output of data k, i y is mean value of output for node t i .And the relationship of under node is followings.where, is mean value of output for left(right) node.In each node, largest decreased error ΔR is determined for the best branch rule.In the result of analysis of treebased model, a brief rule (similar the IF-THEN rule) can be made from data.This rule can be illustrated with trees structure and it is easy to understand.RF is a combination of tree-based model such that each tree depends on the values of a random vector sampled independently.Since it is known that the RF classifies the meteorological data exactly, RF is used as classification tree in this paper.More detail techniques for application of tree-based-model and RF are mentioned in reference [18].

Feed-Forward Neural Network Adapted Back Propagation Method
Figure 3 shows the FFNN having l and m units in input layer and hidden layer, and n unit in output layer.These units are connected with linear coupling, and from x 1 to x l are input data to NN.There are connection weights between each unit.Outputs of hidden layer units are converted to nonlinear values by the hyperbolic tangent sigmoid-function.That function is as follows: where, x is the input data.Back Propagation (BP) method is adopted for learning the NN.Generally, BP is explained as follows.To begin with, outputs of hidden units H m are transmitted to output layer units O n .Then, the outputs of output unit are compared with target signal T n as shown in Figure 3. Finally, to minimize the mean square error margin, each connection weights and the output value of each unit are changed in direction of straight line from output layer to input layer.In this paper, Levenberg-Marquardt algorithm is adopted for updating each connection weights of units [19].The momentum coefficient and learning coefficient are the learning parameter of NN.The momentum coefficient promotes learning speed acts rapidly by changing each connection weights of units.The learning coefficient is preferred to large.However, if it is too large, network becomes unstable.We assume that the mean square error margin of NN model should not be unstable.The authors decide these parameters by trial-and-error method.

Radial Basis Function Neural Network
Figure 4 shows the RBFNN and the explanation shown below is summarized [14].Output of hidden units H m are converted by radial basis function.Consider a mapping from d-dimensional input space x to one-dimensional target space t.The data consists of N input vectors p x , together with corresponding target p t .The goal is to find a follow function The RBFNN approach introduces a set of N basis functions, one for each point, which take the form . Thus, the p-th such function depends on the Euclidean distance between x and p x .The output mapping is then taken to be a linear combination of the basis function The interpolation condition given by Equation ( 7) can then be written in matrix form as If the weight w p in Equation ( 7) are set to the value given by Equation ( 8), the function h(x) represents a continuous differentiable surface that passes exactly through each data point.Several forms of basis function have been considered as following equation, where, σ is a parameter whose value controls the smoothness properties of the interpolating function φ(x).
Training the RBFNN aims to minimize the sum-ofsquares of error function defined by equation (10), its minimum can be found in terms of the solution of a linear Equation (11).
The formal solution of the weights are given by † T W   (12) where, P is pattern index, T(= d pk ) is target signal, pk is output, W is weight matrix, and Φ † is the pseudo inverse of Φ.Thus, the weight can be found by fast, linear matrix inversion techniques [14].o

Recurrent Neural Network
Figure 5 shows RNN model of Elman type NN.Unit characteristic of RNN is same as that of FFNN, and it learned by BP.However, RNN has a Context layer.These layer contain copy of hidden layer with time-delay lines, and added as feedback structure.The context layer reflects both input and output layers information to the structure of RNN, by intervening the feedback structure by hidden layer.In consequence, the past information is maintained to RNN with the progress of learning.In Figure 5, Y t is the output of the hidden layer, and Y tn is the output of the context layer.Y tn is the following equation:

Figure 5. Recurrent neural network (Elman type model).
where, r is called a residual ratio.The value of r varies between 0 and 1.As a result of learning RNN, past informations are reflected to RNN.In time-series data forecasting, it is difficult to maintain the past information by using simply FFNN.But, the composition of RNN that has the feedback structure is said to be effective [16].

Methodology and Input Data
The meteorological data in 2004 is used for training the NN.The input data shown in According to forecasting method in Figure 1, since the RF classifies the meteorological data exactly, NN will train the solar radiation with smoothly.In addition, since the classification results of RF are compensated by NN, forecast ability will be more accurate.Table 2 shows the learning parameters of the NN.The number of hidden layer units is decided to minimize the output error of NNs by simulation result with using the training data.There are some methods for obtaining the number of hidden layer units, however there is no general solution for this problem [17].In this paper, a trial-and-error method has been used to determine the appropriate number of hidden layer units.Based on our conventional research, the authors think that it is convenient to make  forecast model by trial-and-error approach.For example, all of training results were checked when the number of hidden layer units (=10 -60) are changed for training data.After the training results, the number of hidden layer units which shows the best training result is determined as an optimal number.As shown in Figure 6 for this study, the optimized number of hidden layer units is 59.In the next paragraph, the meteorological data which is used for the decision of similar day, the variable importance and the Euclid norm with weighted factors are explained.

Meteorological Data
Naha City, Okinawa Prefecture in Japan is chosen as forecast area.

Variable Importance
The concept of the RF is to boost the accuracy by integrating simulation.First data sets are branched by classification tree like IF-THEN rules.Remaining data sets are branched by next classification trees repeatedly [18].Last remaining data sets are called terminal node.Also previous nodes belong in the data sets.The importance of these characteristics is called "Variable importance".Table 3 shows the variable importance determined by using meteorological data in 2003.It indicates that solar radiation is supposed as explained variable and maximum importance value is 100.

Euclidean Norm with Weighted Factors
A Euclidean norm with weighted factors is used to evaluate the similarity between the forecast day x i and searched previous day pi x .In general, the following equations are used as Euclidian norm with weighted factors for selection of similar days: where variable importance which obtained from RF is used for the weighted factors a i .In (14), n represent the numbers of variable, I = 1 -24.If the Euclidean norm weighted factors will be smaller, the evaluation of similar days is better.

Simulation Results
This section shows the simulation results of one-dayahead 24-hour solar radiation forecasting.In addition, determination method from the solar radiation data to the solar heat energy collection for solar heat utilization system is indicated.The result of forecast error was calculated after forecasting time.

Solar Radiation Classification Results
Tables 4-7 show the simulation results of one-day-ahead 24-hour solar radiation classification in 2003-2005.These results indicate that all similar day data is selected by RF.Although RF made a classification of solar radiation level on the basis of Table 4 in 2003 (Table 5), RF made some error of classification in 2004 (Table 6) and 2005 (Table 7).Therefore, these errors of similar day data classification should be compensated to get higheraccuracy forecast results by use of NN mentioned above.

Forecasting Result of Thermal Energy Collection of Solar Heat Utilization System
The method of calculating the thermal energy collection of solar heat utilization system from the obtained solar radiation forecasting values by proposed NN models are shown in this section.In the thermal energy collection of solar heat utilization system, per unit area of thermal energy collection Q a is represented by: where, α is 0.24 cal/J, η is the conversion efficiency of thermal solar collection (%), I a is the solar radiation (MJ·m −2 ), n is the number of thermal solar collection, A c is collection area (m 2 ).If the above equation of solar heat utilization system is used, the thermal solar collection can be forecasted by using only weather data.In this paper, assume that sum total solar radiation will be falling on the thermal solar collection, and it does not consider the incidence angle of solar radiation and thermal solar collection.Moreover, assume that the conversion efficiency η is 60%, collection area A c is 1 m 2 .Figures 7-9 show the forecasting results of 24-hour-ahead thermal energy collection for solar energy utilization system on June in 2005.It should be noted that classification error of similar day data is compensated by use of NN.In the below of the Figure 7, dashed line is fore cast result by using proposed method.At the 3900 -4000 hour, it shows that the similar day data are compensated by use of NN.Figures 8 and 9 show the Mean Absolute Error (MAE) Q er between actual data and forecast data.MAE is represented by:   where, N is number of data, P f is forecast value, P a is actual value, and i is number of forecasting time.Since the simulation result in Figures 8 and 9 show that MAE is decreased by use of RF and NN in hourly and in each month, it can be seen the validity of the proposed method.

Conclusion
This paper proposes the thermal solar collection forecasting of solar heat utilization system based on solar radiation forecasting at one-day-ahead 24-hour-ahead by using three different NN models.The proposed technique for application of NN is trained by weather data based on tree-based model, and tested according to forecast day.The merit of the proposed method is that it requires only meteorological data.In fact, it is possible to forecast preresults by using only meteorological data in short time.The validity of the proposed method is confirmed by the computer simulations at one-day-ahead 24-hour thermal solar collection forecasting.In next stage, our future work is comparing the proposed forecast models with the other method on various cases.

Figure 1 .
Figure 1.The concept of the solar radiation forecast technique.

Figure 2 .
Figure 2. Example of tree-based model.

2 ]Figure 6 .
Figure 6.Learning error to number of hidden layer units.

Figure 9 .
Figure 9. Mean absolute error with an hourly (thermal energy collection).

Table 1
indicate the input pattern of NN.In training process of NN, output data T 1 -T 24 are used for actual solar radiation data in 2004.Solar radiation data in 2003 are used for input data x 1 -x 24 .It is better that the relationship between x 1 -x 24 and T 1 -T 24 is high correlation.If similarity data (this paper call similar day data) is used for between input and output of NN, training process will be smooth.In this paper, Euclidean norm with weighted factors are used in order to evaluate the similarity between forecast day and searched previous day.Therefore, x 1 -x 24 data are selected by Euclidean norm with RF from 2003 data correspond- ing similar day in 2004.In this process, we note that RF will determine the pre-forecast result (i.e.similar day data).At the time of forecast in 2005, input x 1 -x 24 for NN are selected from solar radiation data in 2004.So, output data of NN will be solar radiation forecast results in 2005.

Table 2 . Learning parameters of the NN.
[20]training data of NN is used groundobservation data and Grid Point Value (GPV) data that are "Japan meteorological business support center" has issued[20].Numerical Prediction Division of Japan Meteorological Agency (NPD/JMA) produces many kinds of aviation weather forecast products which are derived from numerical weather prediction (NWP) output data.
In this paper, meso-scale NWP model (MSM) data is used for 24 hours ahead forecasting simulations.The following explaining variables are used for classifier the solar radiation level in this paper.dy: Date, t: Daily mean temperature [˚C], we: Weather, c: Daily mean all cloud amount, ap: Daily mean atmospheric pressure [hPa], r: Daily mean relative humidity [%], ws: Daily mean wind speed [m/s], p: Daily mean precipitation [mm], (total 8 types data).