_{1}

In this paper, we apply clustering analysis of data mining into power system. We adapt K-means clustering algorithm to analyze customer load, analyzing similar behavior between customer of electricity, and we adapt principal component analysis to get the clustering result visible, Simulation and analysis using matlab, and this well verify cluster rationality. The conclusion of this paper can provide important basis to the peak for the power system, stable operation the power system security.

On the one hand, in the age of big data such a massive information, data affects our works and lives every second, data mining and clustering analysis is becoming more and more important, on the other hand, With the rapid development of our national economy, the power consumption is larger and larger. And our current power source is mainly rely on thermal power, in order to ensure the stable operation of power system, power dispatch and peak becomes more and more important. The clustering analysis to customer power load is a key link in power decision. Therefore, this paper will focus on the application of large data in power system. Clustering algorithm can be divided into different classification with different standards. Commonly used algorithms in clustering analysis include K-means clustering algorithm, agglomerative hierarchical clustering algorithm, SOM of neural network clustering algorithm, the FCM of fuzzy clustering algorithm, and so on [

The source of data used for clustering analysis in this paper comes from reference [

Clustering is one of the important research topics in data mining, is the process of physical objects into multiple classes or clusters [

It’s quick and sample;

For large data sets with high efficiency and scalability;

It has nearly linear time complexity, and it is suitable for mining large data sets. K-Means clustering algorithm's time complexity is a function of n, k, and t. Where n stands for the number of objects in data sets, t stands for number of the iteration algorithm, k stands for the number of clusters.

So this paper uses the K-means clustering algorithm to analyze customer load, and design a flow chart as shown in

Randomly select k points as the initial clustering center

On the i of sample points

Recalculate the k cluster centers, according to type (2)

In the formula,

Repeat step 2) and step 3), until it reaches the convergence criterion function. The evaluation of convergence is based on the square error criterion, as shown in Formula (3).

In the formula, E is the sum of square error of all the objects in the database; x is a point in space;

Using the above K-means algorithm, cluster analysis was performed on the data obtained, thus draw customer load can be divided into 4 categories obviously, and the clustering center is shown in

Because of the use of the large amount of data, selected 144 sampling moments every day, we can’t express the clustering results directly. In order to get the clustering results visual, we use the method of principal component analysis (PCA) to study the clustering results.

PCA is a mathematical method of dimensionality reduction. It can take many variables with certain correlation into a set of new independent variables [

The following is the introduction about the calculation steps of PCA standardization processing of the original data.

Assuming the sample observation data matrix is

Then the original data were standardized according to the following methods

where,

Calculation of sample correlation coefficient matrix

For the sake of convenience, assuming that the original data standardization is still denoted by

where

The calculation of characteristic value and corresponding characteristic vector of relation coefficient matrix R.

Characteristic value:

Characteristic vector:

Select the important principal components, and give the expression

Through principal component analysis, we can get p principal components, but because the variance of each principal component is decreasing, the quantity of information is declining. In practical analysis, according to the principal component contribution to select the first k principal components, usually the accumulative contribution rate can reach more than 85%, in order to ensure the integrated variables carry most information of original variables.

where

contribution rate

The calculation of principal component scores

According to the original data standardization, the principal component values are calculated from the expression for each sample, you can get all new sample data in each principal component, that is, the principal component scores. The specific form is as follows:

where

Since we finished the clustering analysis, principal component analysis, then make the scatter of clustering results, by MATLAB. As shown in

In order to verify the rationality of K-means algorithm used in this paper, this paper uses MATLAB simulation analysis to explain.

Subgraph A conveys the distance from all 4 kinds of customer load to A clustering center, we can see that all points of customer load 3 is nearest to A clustering center

Subgraph B conveys the distance from all 4 kinds of customer load to B clustering center, we can see that all points of customer load 2 is nearest to B clustering center

Subgraph C conveys the distance from all 4 kinds of customer load to C clustering center, we can see that all points of customer load 4 is nearest to C clustering center

Subgraph D conveys the distance from all 4 kinds of customer load to D clustering center, we can see that all points of customer load 1 is nearest to D clustering center

This is consistent with the definition of the clustering center, the relationship is also compatible with

In the earlier analysis based K-means clustering and principal component analysis, draw the visualization map according to clustering analysis results, as

In this paper, the K-means clustering method in data mining was used in power system on the clustering analysis power load of customer, and the method of principal component analysis was used on the clustering results visualization, fully prove the rationality and correctness of the clustering. To provide an important basis for the power system decision, and ensure the stable operation of power system.

Lin Liu, (2015) Cluster Analysis of Electrical Behavior. Journal of Computer and Communications,03,88-93. doi: 10.4236/jcc.2015.35011