^{1}

^{1}

In order to improve the utilization of the residential electricity consumption data which contains the information on the user’s electricity consumption habits, a residential electricity consumption behaviors mining algorithm model is constructed. Firstly, according to the attribute, the collected data can be divided into the global data and the phase data, then the appropriate global variables are selected to mine the user’s electricity consumption patterns in the near future on the system clustering algorithm. Based on the theory of grey relational analysis, combing phase data with the power modes to analyze the potential characteristics of residential electricity consumption behaviors deeply that verify the ability of latest power mode to predict household electricity consumption situation in the coming few days and the effect of dominant phase variables on the peak load shifting. Finally, from the actual data of a certain family, the proposed data mining algorithm is testified that it can effectively explore the electricity consumption behavior habits and characteristics of the family.

In recent years, facing the growing problem of energy crisis and environment protection, as well as the increasingly tense relationship with the electric power supply and demand, smart electricity consumption which can improve the quality of electric power supply service and the electricity efficiency by means of intelligent measurement, high speed communication and high efficiency control, has become the focus of global attention [

At present, the research on the analysis of the user’s power using data has become a hot spot in the country that a variety of efficient data mining algorithms combined with the emerging computing model such as cloud computing platform has been studied and applied. References [

In this paper, based on the data of household electricity consumption, the definition and division are achieved firstly. Then the system clustering algorithm which overcomes the problem of the selection of the clusters’ number and the initial center in the traditional k-means algorithm is applied for the analysis and feature mining of the user’s total power consumption in the near future to get the user’s frequent electricity consumption mode. By the means of the gray correlation analysis, the household main power consumption behavior is gotten from each power mode, which is used to verify the user’s electricity behavior predictability and peak household load shifting by the main power consumption behavior.

For a single family, considering the influencing factors of user’s electricity consumption behavior such as real time, climate change and human activity cycle, the family’s recent 10 - 15 days of electricity data is selected as the base of analysis. The collected information can be divided into two types: 1) Global data: including energy consumption, power, voltage and current, etc. 2) Phase data: the electricity data of the intelligent electrical appliances or the traditional electrical appliances with intelligent sockets, the electricity data of the home appliances with the similar attribute, and so on. Using the one day’s data as a unit sample, the characteristics of the recent household electricity consumption behavior is excavated by the means of the system clustering and gray relational degree.

1) Variable screening and sorting completes the division and definition of the gathered data. The variables that have great contribution to the mining purpose are chosen as the input parameters of the data mining algorithm, what reduces the calculation process and increases the validity of the data mining result.

2) Household power mode mining selects the appropriate global variables to cluster the household electricity consumption using the system clustering algorithm and digs the family’s common electricity consumption modes to extract its characteristic curve.

3) Potential power characteristics mining. On the basis of the above-men- tioned electricity consumption modes, combined with the phase parameters, household electricity consumption behavior characteristics are analyzed by the gray relational degree, which provides the basis for smart power strategy.

As an unsupervised learning process, clustering has a good effect on mining the

inner relations of unmarked data sets [

Supposing the number of samples of household electricity consumption is s (10 ≤ s ≤ 15), global variables and phase variables are attribute parameters, and the kth sample can be described as: f_{k} = [f_{kA},f_{kB}] = {A_{k}_{1}, A_{k}_{2}, ・・・, A_{km}, B_{k}_{1}, B_{k}_{2}, ・・・, B_{kn}} (k = 1, 2, ・・・, s), where A_{ki} (i = 1, 2, ・・・, m) is a global variable, B_{kj} (j = 1, 2,・・・ n) is a phase variable, and both of which are time series parameters. Assuming that there are qacquisition points in the sample period, A_{ki} = {_{kj} = {

where: μ is the sample mean of x, and τ is the sample standard deviation. For the convenience of writing, standardized variable symbols remain unchanged.

Choosing the global variables as the analysis parameters, s samples are clustered together to calculate the average distance between clusters by formula (2) and (3):

where: d(f_{i}, f_{j}) is the Euclidean distance between samples f_{i} and f_{j}, and I ≠ j, (i, j = 1, 2, ・・・, s), n_{i} and n_{j} are the number of objects of cluster C_{i} and C_{j}, respectively.

Merging the two clusters whose distance is the smallest, then updating the data,

the calculation above is repeated until all the clusters are merged into one. According to the complicated relationship, the situation of the combination of two samples, and the combination of a sample and a cluster is classified as a class. Taking into account that the mean is susceptible to the discrete points and extreme values, the household energy modes M_{1}, M_{2}, ・・・, M_{p} are obtained by utilizing the median eigenvalues to represent the general trend of a class of data sets:

where:

Gray relational analysis is used to quantitatively analyze and compare the dynamic development process of the system, and to excavate the main factors influencing its change, which is mainly composed of three elements as reference sequence, comparison sequence, and gray relational degree. Assuming that the reference sequence at the ith time is x_{0}(i), X_{0} = (x_{0}(1), x_{0}(2), ・・・, x_{0}(n)). Comparison sequences are generally more than one, recorded as X_{1}, X_{2}, ・・・, X_{k}, where X_{k} = (X_{k}(1), X_{k}(2), ・・・, X_{k}(n)). The essence of gray relational analysis is to compare the similarity between the curves of X_{1}, X_{2}, ・・・, X_{k} and X_{0} with time respectively. The gray relational degree represents the relative order of the similarity of each comparison reference sequence to the reference sequence, and the more similar, the similarity degree is greater.

Combined with phase data, the gray relational grade is applied to analyze the latent influencing factors of user’s power consumption modes, and the mode Mp is taken as an example:

1) The absolute difference between the phase and global variables is calculated by the formula (5):

where: k = 1,2, ・・・, m.

2) The gray relational coefficient of corresponding elements between the phase variables and the global variable

where: _{p} at the jth time point.

3) The relational degree of each phase variable to the global variable

4) Comprehensive evaluation is applied to analyze and compare the relational degree of each variable in each mode finally.

This data is from the University of California, Irvine (UCI) database [

Considering the purpose of mining household electricity modes, and that household energy consumption is mainly based on active load, therefore, this paper chooses the daily load data as the analysis parameters, and uses the system clustering method in Section 2.2 to excavate the electricity consumption model to get the agglomeration schedule, as shown in

Step | Aggregation cluster | correlation coefficient | first clustering step | ||
---|---|---|---|---|---|

Cluster 1 | Cluster 2 | Cluster 1 | Cluster 2 | ||

1 | 9 | 10 | 131.640 | 0 | 0 |

2 | 3 | 4 | 180.789 | 0 | 0 |

3 | 5 | 6 | 189.296 | 0 | 0 |

4 | 1 | 2 | 207.926 | 0 | 0 |

5 | 8 | 9 | 220.040 | 0 | 1 |

6 | 5 | 7 | 269.404 | 3 | 0 |

7 | 1 | 3 | 289.126 | 4 | 2 |

8 | 5 | 8 | 329.282 | 6 | 5 |

9 | 1 | 5 | 343.351 | 7 | 8 |

The main purpose of clustering is to make the electricity consumption in the same mode as similar as possible, the electricity consumption of different modes as different as possible. Therefore, considering only the situation that the first agglomeration step is 0, agglomeration process of steps 7, 8, 9 are excluded. And the pedigree chart is shown in

From the figure above, we can see that, from 2010/8/2 to 8/11, the household electricity consumption can be roughly divided into four modes, and each mode is a clustering of days, which reflects the similarity of household electricity consumption in a short time. The median eigenvalue represents the corresponding modes, as shown in

After analyzing a large number of sample data, it is found that the family has a basic load with amplitude of about 0.4 kW. The load fluctuation is small, which is the minimum basic energy loss of the household, while the user electricity

consumption load which is mainly related to the use of electrical appliances is fluctuated obviously. In order to distinguish the user’s electricity consumption load and household basic load, the load fluctuation threshold is set to be 0.5. Then the basic characteristics of the power consumption modes are obtained as shown in

1) Experiment 1

In order to verify the short-term similarity of the household electricity consumption behavior, the electricity data of 8/12 are adopted to compare the correlation between each power consumption modes and the power consumption in 8/12. Besides, based on the electricity data from 8/12 to 8/16, this paper compares the relationship between the past 5 days’ power consumption and the most recent power consumption mode. Detailed data are shown in

It can be seen from

mode | peak period | characteristics of the power consumption |
---|---|---|

1 | 1:00-2:00 12:45-14:00 | After the early peak, there is no obvious electricity consumption behavior and is mainly the basic load power consumption. After the latter peak, there is the same kind of electricity behavior or the same kind of electrical appliances is used. |

2 | 2:15-3:30 14:30-15:30 | There is a similar power consumption behavior or the use of same electrical appliances after the power peak. There is no obvious power consumption behavior after latter peak. |

3 | 3:30-5:00 15:15-17:00 | The electricity consumption is frequent and similar in the whole day. |

4 | 4:15-5:30 16:00-17:30 | The electricity consumption is frequent in the whole day. 2 hours after the early peak and 2.5 hours after the latter the peak exist active electricity consumption behaviors. |

Mode 1 | Mode 2 | Model 3 | Model 4 | |
---|---|---|---|---|

Grey relational degree | 0.8776 | 0.8274 | 0.8196 | 0.8223 |

Data | 8/12 | 8/13 | 8/14 | 8/15 | 8/16 |
---|---|---|---|---|---|

Grey relational degree | 0.8883 | 0.8850 | 0.8710 | 0.8165 | 0.7978 |

gray relational degree is not significant. The above shows that in the short term, the household electricity behavior has certain continuity and similarity. Through the large number of fitting experiments on the random number of days’ load, it is found that the similar electricity behavior cycle is 2 - 3 days, so utilizing the recent household electricity consumption model to predict this family 2 - 3 days power consumption is feasible.

2) Experiment 2

In this experiment, the data of phase parameter in each mode are taken to analyze the correlation between the phase variables. As the kitchen power consumption is zero, we select the two phase variables as the laundry and living area to analyze, and ultimately get the potential impact of each power model, as shown in

It can be seen that in the mode 1 to mode 4, the gray relational degree of living area is greater than the laundry’s. Compared with the energy consumption of the laundry, the load curve of the living area is more similar to the characteristic curve in the corresponding mode, indicating that the electricity consumption habits of living area is more dominant than the laundry’s in the whole household.

From

Mode 1 | Mode 2 | Mode 3 | Mode 4 | |
---|---|---|---|---|

Laundry | 0.7884 | 0.7852 | 0.7798 | 0.8445 |

Living area | 0.9293 | 0.9276 | 0.9262 | 0.9432 |

Greater influencing factors | living area | living area | living area | living area |

fluctuation, is helpful to peak shifting and valley filling, and finally obtain some economic significance

This paper presents a mining model for household electricity behavior based on system clustering and gray relational analysis. According to the analysis of a group of actual electricity data in a certain family, this model effectively excavates the household electricity consumption pattern of a certain period, as well as the electricity consumption behavior affects order of the corresponding model, and validates the predictive ability of the latest power consumption mode. This work will help to mine the users’ potential electricity consumption habits for the power companies to develop the appropriate smart power strategy and improve the quality of power service.

Xu, M.J. and Wang, Y.H. (2017) Residential Electricity Consumption Behavior Mining Based on System Cluster and Grey Relational Degree. Energy and Power Engineering, 9, 390-400. https://doi.org/10.4236/epe.2017.94B044