Forecasting the Demand of Short-Term Electric Power Load with Large-Scale LP-SVR

This research studies short-term electricity load prediction with a large-scalelinear programming support vector regression (LP-SVR) model. The LP-SVR is compared with other three non-linear regression models: Collobert’s SVR, FeedForward Neural Networks (FFNN), and Bagged Regression Trees (BRT). The four models are trained to predict hourly day-ahead loads given temperature predictions, holiday information and historical loads. The models are trained onhourly data from the New England Power Pool (NEPOOL) region from 2004 to 2007 and tested on out-of-sample data from 2008. Experimental results indicate that the proposed LP-SVR method gives the smallest error when compared against the other approaches. The LP-SVR shows a mean absolute percent error of 1.58% while the FFNN approach has a 1.61%. Similarly, the FFNN method shows a 330 MWh (Megawatts-hour) mean absolute error, whereas the LP-SVR approach gives a 238 MWh mean absolute error. This is a significant difference in terms of the extra power that would need to be produced if FFNN was used. The proposed LP-SVR model can be utilized for predicting power loads to a very low error, and it is comparable to FFNN and over-performs other state of the art methods such as: Bagged Regression Trees, and Large-Scale SVRs.


Introduction
Accurate load predictions are critical for short term operations and long term utilities planning.The load prediction impacts a number of decisions (e.g., which generators to commit for a given period of time) and broadly affects wholesale electricity market prices [1].Load prediction algorithms also feature prominently in reducedform hybrid models for electricity price, which are some of the most accurate models for simulating markets and modeling energy derivatives [2].
Traditionally, utilities and marketers have used commercial software packages for performing load predictions.The main disadvantage of these is that they offer no transparency into how the predicted load is calculated.They also ignore important information, e.g., regional loads and weather patterns.Therefore, they do not pro-duce an accurate prediction.
The general problem of electricity load forecasting has been approached with a combination of support vector machines and simulated annealing with satisfactory results [3] when compared against neural network approaches for regression; however, the problem was not addressed for the particular case of short-term electricity load forecasting.The work by Mohandes [4] represented significant advances in this field by showing that support vector machines over-perform typical statistical approaches and typical neural network algorithms, particularly, the author demonstrate that as the training data increases, the better the performance is for support vector-based classifiers; in spite of this findings, no largescale approaches were tested.Recently, Jain et al. [5] studied the case of clustering the training data with respect to its average pattern and then used support vector machines to forecast power load; the paper reports sampling two years of data to train the support vector machines with outstanding results, nevertheless, no largescale methodologies were used.
To the best of our knowledge, no efforts have been reported to address the problem of short-term electricity load forecasting using a large-scale approach to support vector machines.Our motivation to use a large-scale approach is that such an approach will permit the support vector machine to use a much larger set to define the support vectors that will provide a superior regression performance.Furthermore, we will take advantage of the computational efficiency of a linear programming mathematical formulation for a support vector regression problem.This research considers several variables to build a prediction model and compares results among a Linear Programming Support Vector Regression (LP-SVR) approach, a Feed Forward Neural Network (FFNN), and Bagged Regression Trees (BRT).This paper shows that the proposed LP-SVR model provides better forecasts than FFNN and BRT approaches.

Dataset
The dataset used for this electricity load prediction problem includes historical hourly temperatures and system loads from the New England Pool region.The original dataset was obtained from the New England ISO.At the time of writing this paper, the direct link to Zonal load data was the one shown in this reference [6].Table 1 shows the variables included for predicting the electricity load.
These variables are called features.The features to consider are the bulb and dew temperature, the hour of the given day, the day of the week, and whether it is a holiday or weekend.Also, the features include the average load of the previous 24 hours, the lagged load of the previous 24 hours, and the previous week lagged load.
The training set . Consequently, the training set consists of 1461 days, or four years of data.The testing set consists of one leap year of data or 366 days.

Training the Regression Models
The regression models will be constructed using the training set  .The training procedure involves a training set partition into a new training set and a validation set  , which is used to auto-adjust model parameters The actual regression models used in this study are briefly introduced in the following sections.

Feed-Forward Neural Network
The first regression model used was based on neural networks.In fact, this study uses the Feed-Forward Neural Network architecture because they can approximate any square-integrable function to any desired degree of accuracy provided a training set [7,8].A simple FFNN contains an input layer and an output layer, separated by l layers (the set of l layers is known as hidden layer) or neuron units.Given an input sample clamped to the input layer, the neuron units of the network compute their parameters according to the activity of previous layers.This research considers the particular neural topology where the input layer is fully connected to the first hidden layer, which is fully connected to the next layer until the output layer.
Given an input feature vector 4  x  , the value of the -th j unit in the -th i layer is denoted   referring to the output layer.We refer to the size of a layer as The default activation level is determined by the internal bias i j b of that unit.The set of weights where . Given the last hidden layer, the output layer is computed similarly by     where x and the activation function   lin   is of the linear kind which required in regression problems (see text books [9] and [10] for a detailed development).Thus, when an input sample x is presented to the network, the application of 1) at each layer will generate a pattern of activity in the different layers of the neural network and produce an output with 2).Then d is the regression output of the neural network. The

Bagged Regression Trees
Bagging stands for "bootstrap aggregation", which is a type of ensemble learning [11].The algorithm in general works as follows.To bag a regression tree on a training set , the algorithm generates several bootstrap clones of the training set and grows regression trees on these clones.These clones are obtained by randomly selecting tr N samples out of tr N with replacement.Then, the predicted response of a trained ensemble corresponds to the average predictions of individual trees [12].
The process of drawing tr N out of tr N samples with replacement omits an average of 37% samples for each regression tree.These are called "out-of-bag" observations.These out-of-bag observations are used as a validation set  to estimate the predictive power.The average out-of-bag error is computed by averaging the outof-bag predicted responses versus the true responses for all samples used for training.This average out-of-bag error is an unbiased estimator of the true ensemble error and can be used to auto-adapt the learning process [11].

Large-Scale Support Vector Regression
Included in this study is the large-scale support vector regression (LS SVR) training strategy by Collobert, et al. [13], considered the most popular LS-SVR training strategy.Collobert, et al. algorithm is an adaptation of Joachims' SVM method for SVR problems [13].The authors reformulate the typical dual SVR problem to have the following Quadratic Programming (QP) minimization problem: where K is a kernel matrix;  's are the Lagrange multipliers associated with the solution of the problem; d is the vector of desired outputs, i.e., the targets;  defines the parameter of the loss function, which represents the amount of deviation from the exact solution that is permitted; C is the regularization parameter.
Next, the authors perform the same decomposition proposed by Osuna, et al. [14], defining the working set  and the fixed set  , where the size of the working set is  with tr N   .The authors extend Joachims' idea of the steepest descent direction to select the working set at each iteration of the SVR dual problem [15].As it is known [13], this method also uses a chunking ap-proach, and a shrinking strategy.

Linear Programming Support Vector Regression
Finally, this study also includes a linear programming (LP) SVR formulation of Rivas et al. [16,17].The author uses the following SVR optimization problem: , where  's are the variables that account for the deviations from the actual exact solution to the optimization problem, i.e., they are relaxation parameters.Since the canonical form of a linear programming problem is the following: Therefore Problem (4) was posed as a linear programming problem by defining the following equalities: where , having z as the vector of variables that contains the unknowns.
Finally, in order to find the solution to the problem, Rivas et al. [16] use a primal-dual interior point methods-based solver to find the variables that satisfy the KKT conditions.The LP-SVR parameters used are 0.125   , 0.5 C  , and 0.1   ; these have been found empirically for both this LP-SVR and that SVR by Collobert explained in Section 3.3.

Experiment Design and Procedure
The experiments consisted of training the four methods with a training dataset as explained in Section 2. Then the following six error metrics were analyzed using the testing set : Mean Absolute Percent Error (MAPE), Mean Absolute Error (MAE), Daily Peak MAPE (DPM), Normalized Error (NE), Root Mean Squared Error (RMSE), and Normalized Root Mean Squared Error (NRMSE).
The mean absolute percent error can be computed with the following equation: where i y is the -th i observed regression model output corresponding to the -th i input vector i x .The Mean Absolute Error is estimated with: The Daily Peak MAPE consists on analyzing the MAPE in a daily fashion.That is, within the testing set choose segments corresponding to a complete day as follows: , , , , , , and observe the predicted daily output , , ,  then estimate the peak MAPE of that day.Formally, the DPM can be defined as follows.
Let  denote the number of days available in the testing set.Let  be the set of sample indices correspond- ing to the different number of days: where a  denotes the set of indices corresponding to samples of -th a day.Then the Daily Peak MAPE is obtained as follows: The following equation is used to compute Normalized Error: while the Root Mean Squared Error and Normalized Root Mean Squared Error are computed as follows: where σ is the standard deviation of y.

Quantitative Results
Table 2 shows quantitative prediction errors using the metrics explained above: MAPE, MAE, DPM, NE, RMSE, and NRMSE.According to results in Table 2, the proposed LP-SVR model performs with lower error than BRT, FFNN, and LS SVM.This result is consistent for all the metrics.However, very small differences can be observed between the performance of FFNN and LP-SVR.This can be confirmed by observation in    true load compared with the predicted load for the four different methods, and also (bottom) shows the error residuals for the four methods.
As expected, the results of FFNN and LP-SVR exhibit very little difference.In general most methods predict the true model to a relative low error.Figure 3 shows a particular two-day window for Christmas Eve.As for many holidays, Christmas Eve is very difficult to predict due to the high variability in electricity consumption.The figure demonstrates a considerable high prediction error between 14:00-21:00 Hr on 12/24/2008.
Figure 4 shows the error distribution for the different methods.It can be concluded that FFNN and LP-SVR have smaller error variances.Similarly, Figure 5 illustrates the absolute error distribution, including the mean absolute error for each of the four methods.It can be seen that both FFNN and LP-SVR have almost the same MAEs.
An interesting analysis is the average error visualization by hour of day, shown in Figure 6.It can be seen that early morning hours (00:00-05:00) are the most "easy" to predict, i.e., can be predicted with very small error.In contrast, the late morning trough afternoon hours (06:00-22:00) are predicted with larger errors.
Figure 7 illustrates the average error by day of the week.Clearly, the days that produce higher errors are those associated with Mondays through Fridays, that represent the work week.It is important to notice the error scale between Figures 6 and 7.In Figure 6 the largest error is below 1.8 × 10 4 , while in Figure 7 the largest error is below 1.6 × 10 4 .This implies that errors are expected to be greater if the prediction is based on hourly data.From this one can conclude that the prediction is more independent of the day of the week, and more dependent on the hour of the day.
The final analysis is in regard to the statistical properties of the errors of the proposed LP-SVR model.2.
error measures, it is desired that the box plots have a very small box close to zero on the error axis, the median should be close to zero, the extreme points should be close to the box, and of course no outliers are desired.
An hourly breakdown of the LP-SVR mean absolute prediction error is shown in Figure 8.It can be noticed that the early morning hours have smaller variability.Then a daily breakdown of the LP-SVR mean absolute      Experimental results indicate that the proposed LP-SVR method gives the smallest error when compared against the other approaches.The LP-SVR shows a mean absolute percent error of 1.58% while the FFNN approach has a 1.61%.Similarly, the FFNN method shows a 330 MWh (Megawatts-hour) mean absolute error, whereas the LP-SVR approach gives a 238 MWh mean absolute error.This is a significant difference in terms of the extra power that would need to be produced if FFNN was used.

Conclusions
The proposed LP-SVR model can be utilized for predicting power loads to a very low error, and it is comparable to FFNN and over-performs other state of the art methods such as: Bagged Regression Trees, and Large-Scale SVRs.

Figure 1 .
Figure 1.Framework to build regression models for power load prediction.Blocks on the left indicate the input variables, i.e., attributes used to build the regression models.
FFNN requires a training phase to build the model   , W b .In this training phase, we used the "Levenberg-Marquardt" algorithm along with with a back-propagation strategy to update the weights W and biases b .As a learning function, we used the well established method of gradient descent with momentum weight and bias.The FFNN training phase ends when any of the following conditions holds: • A number of 100 epochs (i.e.training iterations) is reached; • The actual mean of absolute error (MAE) is 1 × 10 −6 ; • The gradient step size is less than or equal to 1 × 10 −10 .A well-established technique for preventing over-fitting in the training was also implemented.This technique consists of partitioning the training set into two sets, training (80%) and validation (20%), such that when the MSE has not been decreased in the past five iterations using the internal validation set, the training phase stops and rolls back to the model   , W b associated with the previous minimum MAE.The network has 20 neurons in the hidden layer, and, as said before, the network uses the mean of absolute error (MAE) metric as the error function to minimize during training.

Fig- ures 2 and 3 .
Figure 2 (top) shows a two-day window of

Figure 2 .
Figure 2. Two-day window of true data compared with predicted for the four different methods (top).Error residuals for the four methods (bottom).

Figure 3 .
Figure 3. Christmas two-day window of true data compared with predicted for the four different methods (top).Error residuals for the four methods (bottom).Note the high prediction error between 14:00-21:00 Hr.

Fig- ures 8
through 10 shows statistical plots known as "box plots."These plots provide the following information: on each box, the central mark is the median, the edges of the box are the 25-th and 75-th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers (+) are plotted individually.In terms of

Figure 4 .
Figure 4. Error distribution for the LS SVM, BRT, FFNN, and LP-SVR regression methods.The methods with smallest variances are FFNN and LP-SVR.

Figure 5 .
Figure 5. Absolute error distribution of the LS SVM, BRT, FFNN, and LP-SVR regression methods.The vertical lines indicate the mean absolute error for each of the four methods as reported in Table2.

Figure 6 .
Figure 6.Average error by hour of day.Note the error proportional difference in early morning hours and afternoon hours.

Figure 7 .
Figure 7. Average error by day of week.Note the error proportional difference in working and non-working days.prediction error appears in Figure 9, from which one can see that Mondays and Fridays have the largest average errors and that Fridays have many outliers.Finally, a monthly breakdown is shown in Figure 10.This figure clearly shows that the months of November and December exhibit the largest average errors, have the largest variability, and show many outliers.

Figure 8 .
Figure 8. Hourly breakdown of the LP-SVR mean absolute prediction error.Note that the early morning hours have smaller variability.

Figure 9 .
Figure 9. Daily breakdown of the LP-SVR mean absolute prediction error.Note that Mondays and Fridays have the largest average error.
This chapter presents an application of the proposed LP-SVR model to electricity load prediction.A number of eight different variables are utilized to construct regression models.The study includes a comparison of the LP-SVR model against other state of the art methods, such as FFNN, BRT and LS SVM.

Figure 10 .
Figure 10.Monthly breakdown of the LP-SVR mean absolute prediction error.Note that the month of December exhibits the largest average error, and the largest variability.