Bootstrapped Multi-Model Neural-Network Super-Ensembles for Wind Speed and Power Forecasting

The bootstrap resampling method is applied to an ensemble artificial neural network (ANN) approach (which combines machine learning with physical data obtained from a numerical weather prediction model) to provide a multi-ANN model super-ensemble for application to multi-stepahead forecasting of wind speed and of the associated power generated from a wind turbine. A statistical combination of the individual forecasts from the various ANNs of the super-ensemble is used to construct the best deterministic forecast, as well as the prediction uncertainty interval associated with this forecast. The bootstrapped neural-network methodology is validated using measured wind speed and power data acquired from a wind turbine in an operational wind farm located in northern China.


Introduction
There has been an increasing emphasis towards a greater use of renewable energy (e.g., solar, wind, geothermal) as a strategy to reduce greenhouse gas emissions and to mitigate climate change.In this context, one of the fastest growing sources of renewable energy for the generation of "green electricity" is the power obtained from wind turbines.The ever increasing use of wind power poses new challenges.One important challenge is how to accommodate the unpredictable fluctuations in wind speed and direction which lead to variability and uncertainty in the wind power generation.The latter has significant implications for unit commitment and determination of scheduling and dispatch decisions (economic dispatch) needed for the optimal utilization of wind energy within a mixed power system.In this regard, wind power forecasting has become a critical component in the efficient management of a green electrical power system (required by generation companies and utilities) and in electrical market operations (required by energy market analysts and traders).
The development of wind power forecasting models for improving the efficiency and reliability of mixed electrical power systems and for supporting electrical market operations has been reviewed by Costa et al. [1], Ma et al. [2] and Foley et al. [3].Methodologies for wind power forecasting can be categorized into three broad classes as follows: statistical, physical, and machine learning.Because time series of wind speed and power are frequently measured in the vicinity of wind farms, statistical approaches based on time series analysis and forecasting have been applied for the prediction of wind speed and power.Towards this purpose, time series forecasting based on the popular Box-Jenkins methodology [4], as applied to autoregressive (AR) models, moving average (MA) models, and autoregressive moving average (ARMA) models, has been utilized for wind speed and power forecasting using historical time series of wind speed and power.As an example, Erdem and Shi [5] demonstrated the application of an ARMA model for wind speed and direction forecasting.More sophisticated nonlinear time series models that accommodate the temporal evolution of the variance (heteroskedasticity) such as the autoregressive conditional heteroskedasticity (ARCH) model by Engle [6] and the generalized autoregressive conditional heteroskedasticity (GARCH) models by Bollerslev [7] have been applied to describe the intrinsic variability in the wind speed and the associated generated power.For instance, the ARMA-GARCH and the GARCH-in-mean (GARCH-M) models proposed by Liu et al. [8] have been applied to model the mean and volatility of the wind speed.
The second general class of models is physically-based models for wind speed prediction based on numerical weather prediction (NWP) or computational fluid dynamics (CFD).Utilizing equations of physics such as the conservation principles of mass, momentum, and energy in conjunction with various parameterizations for subgrid scale physical processes that cannot be resolved explicitly by the necessarily finite number of grid points that are used to represent the atmospheric flow, NWP and/or CFD models provide hydrodynamic and thermodynamic models of the atmosphere that can be used to furnish a prediction of the flow field in a prescribed region.The prediction of the wind velocity field can be used in conjunction with the power curve for a wind turbine to provide a generated power forecast.Numerical weather prediction models have a number of limitations, including limited spatial resolution resulting in a coarse representation of the local terrain [9].To overcome the latter problem, Liu et al. [10] considered the possibility of coupling a synoptic scale flow model to a large-eddy simulation model for wind energy applications and Li et al. [11] recently introduced a short-term wind forecasting methodology based on the use of CFD pre-calculated flow fields.
The third general class of models for wind power forecasting is based on machine learning approaches such as artificial neural networks, fuzzy systems, and support vector machines [3] [12] [13].Unlike the parametric models used for time series forecasting, machine learning uses either a "gray" or "black" box (essentially nonparametric) representation for the underlying physical processes (defining a nonlinear mapping from an input to an output), and then utilizes various learning algorithms and historical time series of wind speed or power to "train" the gray (black) box.The black box trained in this manner can be applied subsequently to make predictions of the future wind speed or generated power (from a wind turbine).
In this paper, we propose to use the bootstrap resampling method in conjunction with an ensemble artificial neural network (ANN) approach for the multi-step-ahead forecasting of wind speed and generated power.The artificial neural network combines machine learning with physical modeling by using NWP wind speed data from a physical model as the exogenous input to the network.The purpose of the bootstrap resampling method is to reduce the bias in prediction of the wind speed and power and to obtain more accurate estimates for the standard deviation (uncertainty) of these predictions.More importantly, the confidence bands in these predictions can be determined, which can be used to provide a more rigorous uncertainty assessment in wind speed and power forecasting.

Bootstrapping Ensembles of Artificial Neural Networks
As discussed in the previous section, a major concern of wind energy management is the uncertainty quan-tification of multi-step-ahead predictions of the wind speed (at the turbine hub height) and the corresponding power generated by the turbine.Instead of choosing a single best ANN for forecasting, we propose instead to use the bootstrap resampling method in the context of an ensemble of ANNs for predictive uncertainty analysis.

Let
( ) , , , T y y y ′ ≡ y  be a sample of realizations of a (scalar) random variable y , where the positive integer T represents the sample size and the ′ denotes the transpose of a vector (or matrix).In this paper, y represents either the measured wind speed (at turbine hub height) or the power generated by a wind turbine.Similarly, let ( ) , , , T x x x ′ ≡ x  to be a sample of the corresponding predicted wind speed (at turbine hub height) obtained from a numerical weather prediction model.Finally, let ( ) where t is the time step).Note that ζ consists of the sample of collections of the measured wind speed (or power) y and the corresponding modeled wind speed x .We want to first represent (model) the functional relationship between y and x based on the training set ζ , by using an ensemble of artificial neural networks, and then employ all of the trained ANNs in the ensemble to forecast y when new values of x become available.In this sense, the forecasting of wind speed and generated power is obtained by conditioning not only on a single best ANN (model), but on an entire ensemble of plausible ANNs (models).
The nonlinear parameterized mapping f from an input x to an output y can be described generally by an ANN given by ( ) The output of the ANN is a continuous function of the input and θ is a collection of weights and biases (parameters) that determine the architecture of the neural network.By virtue of the universal approximation theorem demonstrated by Hornik et al. [14], it is known that an ANN with three or more layers can approximate any continuous function provided the activation function is a locally bounded, piecewise continuous function.In view of this, in our application, we use a three-layer neural network structure consisting of an input layer, a hidden layer, and an output layer.More specifically, the mapping for the ANN for our application has the following explicit form: ; for the hidden layer and for the output layer.In Equations ( 2) and ( 3), the index l varies over the input, the index j varies over the hidden units, and the index i varies over the output.Furthermore, ( ) ( ) ⋅ are activation functions for the hidden and output layers of the network, respectively.For our application, the activation functions used to define the neural network architecture are ( ) ( ) ( ) ( ) = (simple linear function).Note that each neuron in the network is a unit that combines and processes all the data coming into the layer and then passes the transformed data (output of the activation function) to all the neurons of the successive layer.Specifically, the input of a neuron is a weighted sum of the outputs of all the neurons in the previous layer plus a bias.The weights w and the biases b collectively make up the parameter vector θ .The ANN is trained using the data set ζ by selecting the parameter vector θ so as to minimize some error function which measures how close the modeled output ( ) ; f x θ is to the measured output y of the training set.Two specific forms of the error function that will be used for this purpose are the root-mean-square error (RMSE) and the mean absolute error (MAE) .
The minimization of these error functions is achieved using the particle swarm optimization algorithm [15].Once an ANN has been trained, it can be used for the out-of-sample forecasting of the dependent (output) variable y .
To apply the bootstrap resampling procedure [16] to the ANN, we have to impose a statistical distribution on the sample data ζ .To this purpose, we follow the standard bootstrap method and assume that the empirical distribution function F of t z ( ) is a uniform distribution (viz., one assigns equal probability to each sample value t z ).More specifically, we assign to each t z a probability of 1 T − and realize the distribution F such that ( )  .This distribution implies that any statistic of the observed data (viz., any functional of the data) is invariant under all permutations of the components in the sample ζ .The nonparametric simulation of bootstrap data sets based on the empirical distribution function F is simple: we sample with replacement from the components of ζ (viz., because F places equal probability on each of the original data values in ζ , each sample is obtained by independently sampling at random from these data values).By so doing, we can obtain a bootstrap sample with a sample size of T .We can repeat this process, say N times.Then, we will have N bootstrap samples drawn from the data values in ζ , which we will denote by ( ) ( ) , , , For each of these bootstrap samples, we can train an ensemble of ANNs with the same network architecture, but with each member of the ensemble having different numbers of neurons in the hidden layer (recall that the number of neurons in the input and output layers are determined a priori by the dimension of the input and output vectors, respectively).To be more specific, assume that the number of neurons in the hidden layer of the network architecture varies from j to q ( ) j q ≤ inclusive.Furthermore, in order to explicitly treat the initialization uncertainty (viz., the uncertainty arising from the initialization of the weights used in training an ANN), we will train each ANN model structure (with a fixed number of hidden nodes) starting from m different sets of initial weights θ .In consequence, we have a super-ensemble of ANNs consisting of ( ) − + different ANN model structures, each of which is trained starting from m different random initializations of the weights on a particular bootstrap sample ( ) Each member of this super-ensemble of ANNs can be used to conduct a multi-step-ahead prediction of y .
The procedure for bootstrapping an ensemble of neural networks is summarized as follows: 1. Assign the nonparametric distribution F to the observed data , , , 2. Draw a (nonparametric) bootstrap sample (with replacement) from the empirical distribution function F , and train m ANNs starting from m different (random) initializations for the weight vector θ for a fixed ANN model structure (viz., an ANN with a fixed number of nodes in the hidden layer).Repeat the process for different ANN model structures with the number of nodes in the hidden layer varying from j to q ( ) j q ≤ inclusive.The multi-step-ahead forecasting of the wind speed and generated power y  is carried out for each of the trained ANNs in the ensemble.
3. Repeat Step 2 N times to obtain bootstrap replications for the forecasted wind speed and power; namely, , , , Calculate the mean and standard deviations of the bootstrap predictions of y  using the procedure described below.These quantities can be used in conjunction with a bootstrap-t method to obtain confidence intervals for the forecasts of the wind speed and generated power.
We use a two-stage weighted averaging method to provide the predictive uncertainty assessment for the wind speed and power.For each bootstrap sample, we calculate the predictions (forecasts) of the multi-step-ahead wind speed and power using the ( ) members of the ensemble of ANNs trained using the bootstrap sample.A statistical combination of these forecasts is used to obtain the best forecast based on the current ensemble whose members have been trained using the given bootstrap sample.This procedure is repeated for each bootstrap sample, and the optimum forecast is calculated as a weighted average (statistical combination) of these N best forecasts obtained from the N bootstrap samples in the super-ensemble.The information can be used also to determine the confidence intervals in the forecasted quantities.for each model are determined from the training data set by a constrained 1 minimization of the following quadratic (objective) function: where is the error matrix, with ˆl l ≡ − e y y ( ) Here, ⋅ denotes the Euclidean norm and Once the weights ( ) have been obtained for each bootstrap sample (fixed n ), we can use them to perform the best in-sample predictions ŷ and best multi-step-ahead predictions y  of the output variable y .More specifically, for a given ensemble of ANNs that has been trained for a fixed bootstrap sample, the in-sample predictions of y for the l-th member of the ensemble are denoted explicitly by ( ) ˆˆˆ, , , ˆˆ, , , denotes the solution to the minimization of the quadratic error function given in Equation (8).
For a fixed bootstrap sample, the standard deviation vector ( ) , , , for the multistep-ahead prediction of y is estimated as follows (using the L forecasts l t y  ( ) for t y obtained from the ANN ensemble at the fixed time index t ): .Because this is a biased estimator of the standard deviation, we use instead the following formula 1 Recall the weights ( ) n l w are non-negative and satisfy ( ) to obtain an unbiased estimator of the standard deviation.At this point, we have N different forecasts for the output variable y (along with the standard deviations in these forecasts) obtained by applying the weighted-averaging schema described above to the ensemble of ANNs trained on each of the N bootstrap samples.The second stage of the process is to apply the weightedaveraging schema again to calculate the bootstrap predictions of y  based on these N different forecasts.To this purpose, we define another weight vector ( ) can be evaluated similarly by minimizing an objective function similar to Equation ( 8), except now the error matrix is constructed from the residuals between y and ( ) ˆn y ( ) , where ( ) ˆˆˆ, , ,  is calculated by using Equation ( 9).Once the weight vector η has been estimated, the weighted bootstrap predictions of y are obtained from ( ) and the corresponding bootstrapped standard deviations from where ( ) n t σ  (standard deviation obtained from the ANN ensemble trained on the n-th bootstrap sample) is computed in accordance to Equation (12).
Confidence intervals for the forecast of the output variable y will be obtained using the bootstrap-t method.Towards this purpose, the confidence intervals of the multi-step-ahead prediction B y  of y can be determined in accordance to the following formula: ( with ,v t α being the α -level critical value of a Student's-t distribution with Alternatively, ,v t α can be replaced by the bootstrap percentiles of the sample ( ) So, for example, extracting the 2.5% and 97.5% bootstrap percentiles of this sample would allow one to construct a bootstrapt based confidence interval for the prediction at a coverage level of 95%.

Data Preparation
The two data sets that we analyse were collected from a specific wind turbine, referred to as WT24 hereafter, located in a wind farm in northern China.One of these data sets corresponds to the hourly-averaged wind speeds measured at the turbine hub height and the other corresponds to the associated hourly-averaged power generated by the turbine.The wind speed and generated power time series, consisting of 432 observations each, were measured over a period of 18 days.The measurements collected over the first 15 days (corresponding to 83% of the entire length) of the time series were used as the training set and the remaining 3 days were reserved for the forecast assessment and validation.In addition to these measured time series, wind speed data at turbine hub height obtained from a numerical weather prediction model was available and this information was used as the exogenous input for artificial neural network training.In particular, the modeled wind speed and direction data over the region occupied by the wind farm were obtained from a NWP model executed with a temporal resolution of 1 h on a computational grid with a 3-km spatial resolution centered on the location of the wind farm.We applied a simple bilinear interpolation (BI) on this coarse-resolution NWP wind speed data to obtain the wind speed at the location of the WT24 wind turbine, which was then subsequently used as an exogenous input for ANN training.
As described in the previous section, we bootstrapped (resampled with replacement) the training data set to generate N = 200 "phantom" (bootstrap) data sets.We used each of these bootstrap data sets to train three-layer ANNs with a variable number of nodes in the hidden layer ranging from 5 to 30 nodes inclusive (so, 5 j = and 30 q = ). 2 In order to reduce the uncertainty arising from initialization of the network parameter (weight) vector, each network was trained 5 times each time starting from a different randomly chosen initialization of the weight vector (viz., 5 m = ).For each bootstrap sample (training data set), we have an ensemble consisting of ( ) 1 130 L m q p ≡ − + = member ANNs trained on sample.In consequence, the super-ensemble of ANNs trained on the entire set of bootstrapped samples consists of ( ) 1 26, 000 K Nm q p ≡ − + = member ANNs.The information embodied in this ensemble can be used for the multi-step-ahead forecasting of the wind speed and generated power, along with a quantitative assessment of the prediction confidence.

Results
Figure 1 compares the measured wind speed at the hub height for WT24 with the out-of-sample forecasted wind speed obtained from bootstrapping an ensemble of ANNs.The best deterministic forecast for the wind speed based on a statistical combination of the individual forecasts in the super-ensemble [determined in accordance to Equations ( 10) and ( 13)] is shown in this figure (dot-dashed line labeled "Best forecast").A comparison of the best deterministic forecast with the measured wind speed shows that the forecast captures adequately the longer temporal trends in the measured wind speed.In addition, the 95% prediction confidence intervals obtained using the two-stage weighted-averaging method is exhibited in Figure 1 as the dotted lines demarcated using an open circle.Note that the 95% prediction uncertainty range appears to cover most of the observations, providing an observation coverage that is consistent (approximately or better) with the quoted level of confidence in the uncertainty interval.A quantitative assessment of the forecast performance in this case is summarized in Table 1, which summarizes the RMSE and MAE in the wind speed prediction using the bootstrapped neural-network methodology.The performance of this forecast methodology can be compared to that obtained from a simple persistence model forecast which uses the current wind speed to predict the value of the future wind speed.
Next, we consider the forecast performance for the wind power using the bootstrapped neural-network methodology.The forecast for the generated power is more complicated than that for the wind speed owing to  the fact that the wind power is censored from above.More specifically, wind turbines are designed so that when the wind speed exceeds a certain value (referred to as the rated output wind speed), a limit to the power generation is imposed implying that the power generated is censored from above.To account for this maximum limit in the wind power generation by a wind turbine, the bilinear interpolation of the NWP wind speeds to the location of WT24 were censored (to the rated output wind speed of the turbine) before they were used as the exogenous input for the neural network training.Furthermore, the measured wind power used in the training of the ANNs were already censored from above by the maximum limit for power generation by the turbine.Indeed, for the current example, if the modeled wind speeds exceeded 11.5 m•s -1 (rated output wind speed for WT24), then the generated wind power associated with this range of wind speeds was limited above to 1550 W (rated output power for WT24).
Figure 2 compares the measured power generated by the turbine WT24 with the forecasted wind power obtained using the bootstrapped neural-network methodology.The best deterministic forecast ("Best forecast") is shown based on a particular statistical combination of the individual forecasts obtained from the various members of the super-ensemble using the two-stage weighted-averaging process.In addition, the 95% confidence intervals for the predicted wind power is superimposed on the plot.Table 2 summarizes the values of the RMSE and MAE for power forecasts obtained using the persistence methodology and the bootstrapped neural-network methodology.A comparison of the best deterministic forecast of the power (dash-dotted line) with the measured power shows that the broad features in the variation of the power is captured fairly well.From Table 2, it is seen that this best deterministic forecast gives misfits (either RSME or MAE) that are roughly a factor of three less than those obtained using the simple naïve persistence forecast.The observation coverage for the 95% prediction uncertainty intervals captures the power measurements fairly well, although qualitatively it is judged that this coverage is not as good as for the case of wind speed prediction.This would indicate that there may be sources of uncertainty in the wind power forecasting that have not been accounted for in the bootstrapping process.

Summary and Conclusions
In this paper, we proposed a novel bootstrapped artificial neural-network approach for wind speed and generated power forecasting.The approach provides a multi-ANN model super-ensemble that can be used to provide a best deterministic forecasting for these quantities, as well as to provide a quantitative assessment of the related  prediction uncertainty.In this approach, the individual ANNs that comprise the super-ensemble are first trained using a data set of available wind speed and power measurements and wind speed predictions (obtained from a numerical weather prediction model).The training consists of fitting various artificial neural network architectures (model structures) against the observation and exogenous input to determine the optimal statistical weights for each model.The advantage of this methodology is that the biases in the forecast can be reduced and good predictions (forecasts) can be obtained through a statistical combination of the individual forecasts from the super-ensemble to give the best deterministic forecast.Applications to a wind turbine in northern China show that our proposed method works quite well.Because the method also provides prediction uncertainty bounds in the forecasts, it is anticipated that this approach would be very useful for green electrical system power management.Indeed, with the rapid pace of increases in computational power, it will become easier in the near future for power system managers and energy system traders/analysts to take advantage of the super-ensemble approach, for providing optimal forecasts and quantitative assessments of the uncertainty associated with these forecasts (allowing this information to be used in a more accurate and reliable manner for various applications).

For.
For each bootstrap sample (fixed n), the weights ( ) n w can be chosen to define that statistical combination of the forecasts from the various ( ) L members of the ensemble that provide the best forecast.To this purpose, the statistical weights ( ) n w -step-ahead predictions of y for the l-th member of the ensemble are denoted by the prediction horizon].These best predictions (in-sample and multi-step-ahead) are given by

Figure 1 .
Figure 1.Out-of-sample wind speed forecasting obtained using the bootstrapped neural-network methodology.The 95% confidence interval for the forecast was calculated by using the two-stage weighted-averaging method applied to the ANN members of the super-ensemble.

Figure 2 .
Figure 2.Out-of-sample wind power forecasting using the bootstrapped neural-network methodology.The 95% confidence intervals for the forecast were calculated by using the two-stage weighted-averaging method applied to the ANN members of the super-ensemble.

Table 1 .
Wind speed forecast assessment of the bootstrapped neural-network methodology.

Table 2 .
Wind power forecast assessment of the bootstrapped neural-network methodology.