^{1}

^{*}

^{1}

^{2}

^{2}

^{3}

False data injection attacks (FIDAs) against state estimation in power system are a problem that could not be effectively solved by traditional methods. In this paper, we use four outlier detection methods, namely one-Class SVM, Robust covariance, Isolation forest and Local outlier factor method from machine learning area in IEEE14 simulation platform for test and compare their performance. The accuracy and precision were estimated through simulation to observe the classification effect.

As an important role of the country, the power system has a vital impact on the national economy and public safety. With the in-depth application of information and communication technologies in modern power systems, power systems are gradually developing into cyber-physical systems (CPS) that are integrated by power physical networks and information networks. In smart grids, it requires high quality interaction between the information system and the physical system. However, due to the inevitable defects and loopholes in information communication systems in the power system, data collection, information transmission, and even data control centers are at risk of being attacked, resulting in security incidents in the power network [

State estimation in power system is the estimation of the current system state, to provide data support for the EMS (Energy Management System) to do the optimal load distribution, and economic dispatching. The methods of state estimation are WLS (Weighted Least Square), PQ and so on [

While different types of false data injection attacks despite traditional ones have been found, like attacks aiming at load distribution and economic dispatch [

The paper [

Mohammad Esmalifalak et al. [

Youbiao He [

While SVM, artificial neural network and other machine learning method is going further into application in cyber-attacks in power system, outlier detection in machine learning is still a fresh method that hasn’t been tested, but only in industrial anomaly field. Therefore, in this paper, we firstly applied four outlier detection methods including one-Class SVM, Robust covariance, Isolation forest and Local outlier factor to simulate the false data injection attacks (FIDAs) in IEEE 14-bus power system, and use the Principle Component Analysis (PCA) to prepare the data set. We analyze the performances of the outlier detection methods under different contamination rate by comparing the accuracy and precision. Visualization is used to present the identification result of bad data in the data set during the simulation process.

The power system states are those parameters that can be used to determine all other parameters of the power system, which includes node voltage phasor, complex power flow and so on. When we have measurements, we are able to obtain the states through state estimator.

z = h ( x ) + v , (1)

z = [ P i j , Q i j , P i , Q i , V i ] T (2)

X = [ θ i j , V i j ] (3)

X is the states that can’t be observed directly, it determines the state of power system. z is the measured states, including active power, reactive power and voltage value. The error v in the measurements is assumed as a Gaussian Distribution. The weights are different in order to emphasize the trusted measurement and deemphasize the untrusted one [

False data injection attack is to inject data into the measurements by injecting a non-zero vector e = ( e 1 , e 2 , … , e m ) T , so that the measurement delivered to the state estimator will be,

z b a d = z + e (4)

Then after the state estimation the states would be gained as,

x b a d = x + c (5)

The error will be,

r = z − z ^ = h ( x ) + v + a − h ( x ) − H x ˜ − H c = ( v − H x ˜ ) + ( a − H c ) (6)

when a = H c , the error of the false data will not be detected, the traditional false data detection method based on error cannot detect the false data [

Novelty and outlier detection methods are effective methods for anomaly detection from machine learning. Outlier detection is a type of anomaly detection method that the training data contains outliers and the central mode of the data should be fitted. This strategy is implemented with objects learning in an unsupervised way from the data [

One-Class SVM is a common classifier. It has the capability to capture the shape of data set, therefore performs well for high dimensional non-Gaussian data especially data set from two entirely different types. To separate the data set from the origin, the special quadratic program should be solved so that the decision function will be positive for most examples contained in the training set. The support vector domain description is done in the process to decide the boundary, namely a closed curve in dual dimensional space to surround the positive samples. Under different data distribution the errors could be different. By adjusting the parameter v the negative ratio in the total samples could be changed. The specific principle of this classical algorithm was described in [

Local Outlier Factor is a density-based anomaly detection method. It considers outlier as the degree to which the object is isolated from its surrounding neighborhood. The local outlier factor of a point x is defined as [

L O F k ( x ) = ∑ o ∈ N k ( x ) l r d k ( o ) | N k ( x ) | / l r d k ( x ) (7)

where l r d k is the local reachability density, N k is the k-distance neighborhood, is a point in the space. If the factor is close to 1, it means the density of x’s neighbor is similar that may belong to a cluster. The smaller the value is compared with 1, the higher possibility that the point is an anomaly one.

Isolation Forest is an anomaly detection method first proposed by Fei Tony Liu et al. in 2008 [

When applied in anomaly detection, the process could be as two stages: 1) Training stage: To establish the iTree by recursive separation until all the samples are separated or the tree reaches its limited average height. 2) Evaluation stage: To obtain the anomaly score of every test sample according to the expected path length from the trees. The anomaly score of different instances x can be calculated as:

s ( x , n ) = 2 − E ( h ( x ) ) c ( n ) (8)

where E ( h ( x ) ) is the average of h ( x ) from a collection of isolation trees. And c ( n ) is the average path lengths. If the anomaly score is very close to 1, it is judged as anomalies, if the score is much smaller than 0.5, then the instances are considered normal. The size of subsamples and the number of trees is set 256 and 100 respectively (experience value).

Robust covariance estimation uses elliptic envelope fitting method that fits a robust covariance estimate to the data, and thus fits an ellipse to the central data points, ignoring points outside the central mode. It based on the assumption that the inlier data are Gaussian distributed, the Mahalanobis distances will be estimated according to inlier location and covariance of data and to be utilized as a reference of the intimacy with the group. Mahalanobis distance could be calculated as [

d ( μ , ∑ ) ( x i ) 2 = ( x i − μ ) ′ ∑ − 1 ( x i − μ ) (9)

where μ and S are the location and the covariance of the underlying Gaussian distribution.

For all the four detection methods, the fit function would be able to decide the boundary between normal and abnormal data according to training set and predict the results (labels, 1 is for normal samples, and −1 for abnormal ones). The decision function in the training process will return the signed distance of every sample point to the hyperplane.

In the simulation stage, we evaluate the performances of the outlier detection mechanism using IEEE 14-bus test system as shown in

As shown in the detection process, the attacked estimated data set 1000 containing 100 compromised ones and 200 ones respectively as the test set to be analyzed using python 2.7.13. The number of features of every data set is 41 since the measurement points of IEEE 14 bus system is so. To better prepare the data set, we use Principal Component Analysis (PCA) [

The red points are original data set while the green ones are points after the dimensional reduction process. It can be observed that for the pure normal data set in

As can be observed from

Accuracy = t p t p + f p ,

Precision = t n t n + f n (10)

where stands for true, p means positive and is for negative samples. We repeated the experiment of different contamination rate of 0.1 and 0.2 for 3 times respectively. It is found that the results of 3 times are the same for each rate, which indicated the stability of the algorithms’ performances. The results of accuracy and recall rates of the four different methods for different outlier rate are listed in

Detecting false data injection attacks using outlier detection |
---|

1) Input: Training data from state estimator Z = [ z ( 1 ) , z ( 2 ) , … , z ( 1000 ) ] T , (Total number of the samples is set as 1000) 2) Preprocess the data set Principal Component Analysis: dimensional feature reduction of Z from 41 to 2; |

2) Parameters set for the outlier detectors: size of samples n = 200, contamination rate = 0.1 and 0.2; |

3) Fit the training data in the outlier detection estimators |

Estimator.fit(Z_train) |

4) Sort out the outliers with the predict function of the algorithm: Estimator .predict ( Z_test ) 5) Return: Predicted labels { 1 , z ( p ) isnormal − 1 , z ( p ) isabnormal , z ( p ) is some data point in the measurement data set |

When it comes to the big data detection problem of false data injection attack in power system, machine learning method is an efficient, fast method as a solution. After the dimension reduction, the detection process is quite fast and the results are visualized. All the four outlier detectors perform better when the contamination rate is smaller, no matter in respect of accuracy or precision, which is a benefit for detecting those small-scale attacks that are unobservable in traditional detections. In the first case, robust covariance and isolation forest have the same excellent performance while robust covariance outstands in the higher outlier rate case. The rate of the robust estimator of covariance, which is assuming that the data are Gaussian distributed and performs way better than the one-class SVM in our case, and that also turns that the errors of the data are Gaussian noises. The one-class SVM performs not so well in this experiment. The reason for that could be the rareness of the abnormal samples in the big data. Some normal data points are classified as abnormal data, which cause the low precision and poor performance of One-Class SVM. LOF is a density-based anomaly detection algorithm with respect to its neighbors by comparing their local density, which may also cause a problem of local swamping, namely regarding the positive samples as negative, which largely effect the precision of the detection. That’s also the reason why two areas of “normal data” are figured out. The density of abnormal data misleads the result. Robust covariance and isolation forest have observably good performance in the experiment, and robust covariance achieves extremely high accuracy and precision in the experiment. The machine learning method shows its effectivity in detecting FIDAs. Isolation forest is supposed to stay ideal performance when applying high complexity data, which is to be studied in the future.

In regard with the future work, timely anomaly detection of false data injection attacks is to be studied and applied in bigger power system, since isolation forest is also an excellent algorithm in dealing with the continuous numerical data. Besides, the accuracy and precision of detection is supposed to be improved for high dimension data in the future work.

This work was supported by the National Natural Science Foundation of China under Grant 61772327, Shanghai Municipal Natural Science Foundation under Grant 16ZR1436300, Shanghai University of Electric Power, Department of Smart Grid Center under Grant A-0009-17-002-05. Shanghai Science and Technology Committee under Grant 15110500700. Zhejiang University State Key Laboratory of Industrial Control Technology Open Fund (ICT1800380).

The authors declare no conflicts of interest regarding the publication of this paper.

Yang, C., Wang, Y., Zhou, Y.H., Ruan, J.M. and Liu, W. (2018) False Data Injection Attacks Detection in Power System Using Machine Learning Method. Journal of Computer and Communications, 6, 276-286. https://doi.org/10.4236/jcc.2018.611025