Power Quality Data Compression Based on Iterative PCA Algorithm in Smart Distribution Systems

To reduce the stress of data transmission and storage for power quality (PQ) in smart distribution systems and help PQ analysis, a multichannel data compression based on iterative PCA (principal component analysis) algorithm is introduced. The proposed method uses PCA to reduce the redundancy of data to achieve the purpose of compressing data. In order to improve the calculating speed, an iterative method is proposed to compute the principal components of the covariance matrix. The correctness and feasibility of the proposed method are verified by field PQ data tests. Compared with discrete wavelet transform (DWT) method, the proposed method has good performance on compression ratio and reconstruction accuracy.


Introduction
As the growing demand for smart distribution systems, more and more power quality (PQ) monitors are especially needed for the power systems with distributed power sources and impulsive and sensitive loads [1] [2] [3].In power systems, short circuit fault, capacitor switching device for power factor compensation, power electronic device for special loads etc. bring various PQ disturbances (PQD).With the deployment of a large number of PQ monitors, a big volume of data will be produced for smart distribution systems.And it becomes essential to compress this volume so that the data sets can be transmitted and stored M. Zhang  promptly and efficiently.Hence, PQ data compression calls more concerns than ever before [4].
The main goal of any compression method is to achieve maximum data reduction and preserving morphology features upon reconstruction.Data compression is categorized into methods on lossless and lossy techniques.Lossless methods can obtain an exact reconstruction of the original signal, but high compression ratio (CR) cannot be obtained.In contrast, lossy methods do not obtain an exacter construction, but higher CR can be achieved.Consequently, the commonly used PQ data compression methods are lossy in nature [5] [6].
In general, the data compression scheme for PQ data consists of three steps: signal transform, quantization and encoding.With respect to quantization and encoding, the researchers are more concerned with signal transform.Literature [7] firstly started the PQD data compression by wavelet transform (WT).A threshold was set to eliminate small wavelet coefficients  [13].The PQ data is transformed into singular value matrix that contains nonzero singular values by SVD so that the data can be compressed.The method is also exploring different takeoffs between data CR and loss of information.However, the SVD algorithm is easily affected by the outliers and noises in the data.
Most of the works found in the literatures show PQ data compression in stand-alone power systems.However, data compression in smart distribution systems should be considered more for the distributed control applications.
Moreover, there is a lack of research works on the data compression in smart distribution systems.This paper presents a method based on iterative principal component analysis (IPCA) algorithm for PQ data compression in smart distribution systems, which can compress multichannel data simultaneously.Here the measured data can be conveniently stored in a matrix format, which is suitable for the application of the principal component analysis (PCA) algorithm.PCA is especially useful for complex data analysis, such as face recognition and data compression [14] [15] [16].By employing PCA, good takeoffs between data CR and loss of information can be achieved.Because the size of data matrix is gen-Smart Grid and Renewable Energy erally large, the time-consuming of the traditional PCA is very large.In order to improve the calculating speed, an iterative method is proposed to compute the principal components of the covariance matrix.
The remainder of this paper is organized as follows.The PCA algorithm is introduced in Part 2; Part 3 presents the proposed method; Part 4 provides field PQ data to test and compare the proposed method with related works; Part 5 summarizes the whole work.Suppose that a sample set X contains m samples, and the dimension of each sample is n:

PCA Algorithm
, , , Representing each sample as a row, the m n × sample matrix S is the stack of all such rows, and , then the samples are processed by zero mean, that is, the samples are centralized, to ensure that the average value of each dimension of the matrix is zero.
The sample matrix composed of i x  is denoted as S  , where , then the covariance matrix C of S  is obtained as follows: where C is a real symmetric matrix, and x  and T S  are the transpose of i x  and S  , respectively.According to the matrix theory, a real symmetric matrix can be diagonalized, therefore there is an orthogonal matrix P that meets T = P CP Λ .The following process is applied to obtain the matrix P. Firstly the matrix C is decomposed to get the diagonal matrix Λ and the orthogonal matrix P. Obviously, , Let ˆ= S S P (7) As known from ( 6), the covariance of the dimensions is zero in the matrix S 1 .Each row of the matrix S  is a sample.Furthermore each column of the matrix P 1 is an eigenvector, and the k eigenvectors of the matrix P 1 are orthogonal to each other.So, 1 SP  is the equivalent to linear transformation of each sample of S  in the basis of column vectors in P 1 .After the transform, each row vector of S 1 is completely irrelevant, and the dimension of each sample is k, where . If k < n, then the operation of dimension reduction is completed, while the internal structure of the original data is preserved with the maximum probability.Finally the matrix S  can be approximately recovered by (7).

PQ Data Compression via IPCA
This section presents a methodology that allows the PQ data compression in smart distribution systems.

Data Matrix
PQ data from smart distribution systems, need to be acquired and compressed, then transmitted through the communication network to the server of the control center for further analysis.Let the acquired data be put in the form of a matrix X, shown in Figure 1.It is convenient to represent the data, and be easily used for data compression.Here each row of X is taken from a distributed measurement point at each time instant.

Data Compression
After the centralization processing of the matrix X, the eigenvectors corresponding to the first k eigenvalues of the covariance matrix C are calculated as the principal components by PCA algorithm.The eigenvectors corresponding to the smaller n-k eigenvalues are eliminated, and the remaining ones are constructed to the matrix P 1 .Then the matrix S  is transformed into the matrix S 1 , which the CR is n:k.The Figure 1.Data matrix X.
original data can be reconstructed using (7), and the mean square error of the reconstructed data is equal to the sum of the eliminated n-k eigenvalues.However, the main difficulty of data compression based on PCA algorithm is to find the eigenvalues and eigenvectors of the covariance matrix.
At present there are two kinds of conventional methods: one is firstly uses of to calculate the eigenvalues of the covariance matrix A, then uses

(
) to calculate all the eigenvectors corresponding to the eigenvalues.
Because the size of A is generally large, the computation of this method is very large and time-consuming.So it is not suitable for data compression.Another method is realized by using neural network (NN) method.This method is simpler than the first method, and does not need to compute the covariance matrix.
Taking the 32 samples as an example, the construction of the single-layer NN is illustrated in Figure 2.
The weights of the network are iteratively adjusted and the iterative equation is as follows: From Literature [17], finally the w converges to the eigenvector corresponding to the maximum eigenvalue, but its convergence speed is closely related to the learning factor µ .Only while , the convergence speed of ( 9) is the fastest.
But 1 λ is unknown, so it may lead to poor estimation of the learning factor µ .If the estimated µ is too small, it will result in slow learning speed.In con- trast, if the estimated µ is too large, it will results in divergence of (9).There- fore the NN method has limitations for the real applications.
In order to solve the above problems, this paper proposes a new method for finding the eigenvectors of the covariance matrix.Firstly, prove a theorem as follows: where A is a nonnegative symmetric matrix, ( ) is not perpendicular to the eigenvector corresponding to the largest eigenvalue of A, then  A is a nonnegative symmetric matrix, where where i p is the eigenvector cor- responding to the eigenvalue i λ , and any two vectors in P are orthogonal.The matrix P satisfies: ∈ can be linearly expressed by i p as follows: ( ) Since ( ) is not perpendicular to the eigenvector corresponding to the ( ) ( ) Because A is nonnegativedefinite matrix, let where 1 2 , , i p p p  are the eigenvectors corresponding to the largest eigenvalue 1 λ , so the sum of them multiplied by a scalar is still the eigenvector correspond- ing to the largest eigenvalue 1 λ of the matrix A.
It can be seen that if A is replaced by B, the second principal component can be calculated, and then the other principal components can be calculated by this method in turn.
The following simplified steps are applied to PQ data compression based on the IPCA: Step 1. Get PQ data.
Step 2. Calculate the mean of PQ data using (1).
Step 3. Subtract the mean from PQ data using (2).
Step 4. Construct the covariance matrix of the subtract data using (3).
Step 5. Calculate eigenvectors and eigenvalues of the covariance matrix using the iterative method.
Step 6. Choose principal components and preserve the k (the desired number) principal components which correspond to the larger eigenvalues.
Step 7: Quantization and encoding: preserve the quantized principal components and their indices as the compressed coefficients.
The PQ data compression scheme based on the PCA algorithm is shown in Figure 3.

Data Test, Discussion and Comparison
IPCA and discrete wavelet transform (DWT), both methods are capable of multichannel PQ data compression.PQ data are collected from various measurement points in the smart distribution system to test the methods.The PQ data are sampled at 12.8 kHz and quantized with 16 bits.Here there are 32-channel PQ data, and each channel data consists of 1536 samples.The tested PQ matrix is formed as m n R × , where m is the number of sample channels and n is the number of samples of each channel PQ data.The CR of the proposed method is  and ( 20) respectively are used to evaluate the reconstruction accuracy of PQ data.
where ˆi x is the reconstructed data corresponding to the original data i x of each channel.
In order to compare the performance of data compression using the IPCA, DWT is proposed to carry out for the compression of the same dataset.Table 1 shows results obtained with the IPCA and with Debaucheries 4 wavelet (db4) and four levels of decomposition [7].Different thresholds have been set, aiming to retain the number of wavelet coefficients that would result in the same CRs shown for the IPCA.
It can be seen from Table 1 that the IPCA is capable of achieving better tradeoff for higher CRs.The MAE and MPE of reconstructed data by the IPCA are lower than those of reconstructed data by the DWT.So the performance of data compression using the IPCA is better than that using the DWT.
Figure 4 shows the original PQ data of the first three channels.

Conclusions
In summary, the benefits of PCA algorithm are used to reduce the redundancy of data.Because PCA algorithm is the optimal transform with the minimum mean square error, according to the requirements, the larger eigenvalues are reserved, and the smaller eigenvalues are omitted to reduce the dimensionality, simplify the model or compress the data.With these characteristics, PCA algorithm can be applied to be good for data compression.This paper proposes a multichannel PQ data compression algorithm via IPCA.PQ data is preprocessed to form the matrix.Then IPCA is used to compress the matrix and yields the compressed data.Field PQ data tests validate that the proposed method is characterized with high CR, accurate reconstruction, and low computation complexity.And the iterative method is especially easy to be programmed in computer.
Because the test data is not particularly sufficient, and there are differences of the covariance matrices of the original data, the number of iterations will appear very different using the proposed method.The number of iterations has a great relationship with the construction of the initial vector.How to construct the initial vector and reduce the number of iterations according to the different PQ data needs further study.

PCA algorithm was invented
in 1901 by Karl Pearson, and it uses an orthogonal transformation to convert a set of measurements of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.The main advantage of PCA can reduce the dimension and redundancy of data.The eigenvalues and eigenvectors are obtained by decomposing the covariance matrix of data.Then the eigenvectors corresponding to several larger eigenvalues are found as the principal components, and the projection of the measured data to the principal components is carried out to represent the original data, so as to achieve the reduction of dimension and redundancy of data.

1 1 1 =
eigenvalues of the matrix Λ , and a new orthogonal matrix P 1 is composed of the k eigenvectors, which correspond to the above k eigenvalues.The k eigenvectors are the principal components obtained by the PCA.If T P CP Λ , then Smart Grid and Renewable Energy ( )

1 λ 2 e
The covariance matrix C of PQ data can meet the above conditions of the theorem.This method is to calculate principal components of the covariance M.Zhang  et al.DOI: 10.4236/sgre.2017.812024372 Smart Grid and Renewable Energy matrix: firstly calculate the eigenvector 1e corresponding to the largest eigen- value using(10), then calculate the eigenvector

Figure 3 .
Figure 3. PQ data compression scheme based on the PCA algorithm.
Smart Grid and Renewable Energy given as (18).By determining different number of principal components, different CRs can be obtained.Number of data without compression Number of data after transformation = CR (18) Mean absolute error (MAE) and mean percentage error (MPE) shown as (19)

Figure 4 .
Figure 4.The original PQ data of the first three channels.

Figure 5 .
Figure 5.The reconstructed PQ data of the first three channels (CR = 2).

Figure 9 Figure 6 .
Figure9shows a comparison of the number of iterations for calculating their principal components of the tested PQ matrix by the IPCA and NN-PCA.The horizontal axis is the number of eigenvectors and the vertical axis is the number of iterations.It can be found from Figure9that, in the same number of principal components extracted, the

Figure 7 .
Figure 7.The reconstructed PQ data of the first three channels (CR = 8).

Figure 8 .
Figure 8.The reconstructed PQ data of the first three channels (CR = 16).

Figure 9 .
Figure 9.Comparison about the number of iterations by the IPCA and NN-PCA.
et al.