^{1}

^{*}

^{1}

^{*}

^{1}

^{1}

^{2}

^{3}

This paper devises a scheme which can discover the state association rules of process object. The scheme aims to dig the hidden close relationships of different links in process object. We adopt a method based on difference and extremum to compute the timing. Clustering is used to classifying the adjusted data, and the next is associating the clusters. Based on the rules of clusters, we produce the rules of links. Association degrees between each two links can be determined. It is easy to get association chains according to the degree. The state association rules that can be obtained in accordance with association rules are the final results. Some industry guidance can be directly summarized from the state association rules, and we can apply the guidance to improve the efficiency of production and operational in allied industries.

Big data has 4 characteristics [

In process industry, industrial installation is composed of multiple operation unites or equipment generally. The input of downstream unite is usually output of upstream unite. To make full use of the equipment capacity and mining enterprise production potential, process industry should ensure failure-free operation of the equipment. However, researchers intuitively obtain the correlation of the data simply through regular analysis. No effective algorithm is adopted to discover hidden knowledge, so we get less regulation or rule from big data.

The research on association rules is paid more and more attention by many researchers. Association rule mining was first introduced by reference [

A method proposed by reference [

There are many problems involved in discovering hidden knowledge, such as computing the timing of the process object, the classifying of the data, the producing and using of the rule and so on. Against these problems, this paper proposes a scheme. The scheme adopts various data mining algorithms and technologies to discover the state association rules of process object based on association chains. From the state association rules we can intuitively know how a state change of a link influences the others. According to these rules, people can give the process industries professional guidance in fault analysis, failure detection, optimal state estimation and so on.

For convenience of the following analysis, this paper gives several definitions.

Definition 1. An object composed of

where

Definition 2. A unidirectional chain which is composed of different links based on correlation degree is called association chain.

Definition 3. A rule likes a chain which element is the state of

Assume that process object

This paper devises a scheme to find the implicit state association rule of process object. The scheme consists of five main steps including data sampling, timing analysis, clustering, association rule mining, association chain mining and state association rule generation. In timing analysis step, a novel method based on counting was proposed to determine time series and time delay of different links. In clustering step, data collected at the same time was divided into k classes by k-means clustering algorithm. The novel step uses silhouette coefficient based on cohesion degree and separation degree as the clustering criteria. In association chain step, cluster set was organized into the association chain containing only a single chain or the association tree containing multiple chains. Using these association chains, state association rules are easily obtained. The state association rule reflects the relationship of different links.

The scheme is shown in

In this article, difference serves as sampling criteria and reflects data changes over time. Obviously, the larger the variation of data, the more rich the information contained in these data. In our practical application, original data are divided equally into m segments. Δχ indicates the variation of χ, Δχ is defined as the sum of absolute first-order difference. The segment with the largest Δχ was selected. The selected segment is noted as χ_{M}, and the period is noted as T_{M}. Compared with other segments, χ_{M} contains the most information, so χ_{M} can represent the raw data.

An idea on the basis of difference and extremum is put forward in this section to calculate the timing. In the meantime, the delay time between different links can be known. In process industry, the change of any link will influence the others. Imagining one link has great fluctuation, it must cause some changes in other links. That is to say, there must have the corresponding extremums turned up in some links while a extremum appeared in one

link. The interval between different extremums is the delay time. Let

By now, the time series data is emerged. And then we can adjust the data based on the delay time. Assume the order of all links after adjusted is

In clustering step, k-means algorithm was adopted. After k-means, each link is separated to different classes with their best k. Each class represents a state of the link. So, every link can be simplified as k states. The biggest benefit would be to reduce the amount of computation, thereby increase the practicability of this method.

To determine k, silhouette coefficient [

where

Suppose the best k of link

Apriori algorithm is a most valuable frequent item sets data mining algorithm to find boolean association [

At present, we have already known the rules between links and the association degree. Based on this information, choose the rules to structure association rule which have the biggest association degree and satisfy the order

In order to fully exploit the possible hidden relationships of all links, a binary tree, which we called association tree, need to be constructed. The tree will be constructed on the basis of association chains which can show the relationship of different links. All links can be included in the tree, and every branch of the tree is an association chain.

Suppose any one of association rules is denoted by

From the association rules we can know that there exist mutual influences and relations between links, but it is unable to determine how a link state influence the others. In view of the problem, this section provides an idea based on difference to determine the relationship between adjacent links on association rules. Generally, the state of numeric data can be distributed into 3 types: rise, fall and unchanged.

The state value of any one of

where

The probability of

where

The number from big to small is the process that the state of object from normal to abnormal. We can directly gain some industry guidance from the state association rules, and then give some guidance to improve the efficiency of production and operational in allied industries.

We have performed some experiments to make sure that our method works effective. Power generation system of the electric power is a typical process industry system. The whole process flows of power system is a process object. The historical data of a subsystem of a power plant are selected to be the experimental data. 789 days of data are filtered down to 1,070,008 pieces of data. The time interval of data acquisition is 1 min. There are 8 links with their names list in

The rough industry process of all links is gained after timing. The process is shown as following.

According to

The association tree contains more possibility of the relationships of all links, such as the association tree of

Links | Name |
---|---|

10HNA10CQ1013S | |

10HNA10CQ1013S | |

01_q2 | |

01_q4 | |

01_q3 | |

01_Qnetar | |

01_Vdaf | |

MSTMFLOW |

Links | Association Chains |
---|---|

-- | |

Take the strongest association chain begins with

As we all know, there are only a few states in a process industry. From

Number | Percentage | State | |||
---|---|---|---|---|---|

1 | unchanged | fall | fall | 20.876 | normal |

2 | unchanged | rise | rise | 20.467 | normal |

3 | unchanged | fall | rise | 17.442 | transition |

4 | unchanged | rise | fall | 17.109 | transition |

5 | unchanged | unchanged | fall | 11.17 | transition |

6 | unchanged | unchanged | rise | 7.392 | transition |

7 | rise | rise | fall | 1.271 | abnormal |

8 | fall | fall | fall | 0.758 | abnormal |

9 | fall | fall | rise | 0.741 | abnormal |

10 | rise | fall | fall | 0.55 | abnormal |

11 | rise | unchanged | rise | 0.399 | abnormal |

12 | rise | rise | rise | 0.389 | abnormal |

13 | unchanged | unchanged | unchanged | 0.354 | abnormal |

14 | fall | rise | rise | 0.352 | abnormal |

15 | fall | rise | fall | 0.307 | abnormal |

16 | rise | fall | rise | 0.238 | error |

17 | unchanged | rise | unchanged | 0.183 | error |

18 | fall | unchanged | fall | 0.001 | error |

19 | unchanged | fall | unchanged | 0.001 | error |

sociation rules.

The first two state association rules with the bigger percentage are considered to be rules with the normal states in the process. Both the two have the common feature that

Each branch of association tree is an association chain. The results of association trees are likely to the above analysis.

In this paper, we proposed a novel scheme for mining the state association rules of process object. The method includes k-means algorithm, association rule mining algorithm and other technologies. The rules which contain a close relationship of the links bring the significance. Firstly, it helps decrease the cost of production and enhance the productivity. The association rules show the hidden relationships, so that we can directly increase or decrease some links to adjust the output links instead of waste time on the others. Secondly, from the abnormal rules, some failure can be detected. Knowing well about abnormal states can help to find the reason quickly when the failure happens. So we can gain some knowledge and give the process industry some industry guidance to improve the efficiency of production and operating.

However, the results may be not ideal. The percentage of the transition part is a little high. At the same time, the kind of the state is more than expected. From that point of view, we will try to improve the algorithm of determining the state of each link in every moment.