The Feature Core of Dynamic Reduct *

To the reduct problems of decision system, the paper proposes the notion of dynamic core according to the dynamic reduct model. It describes various formal definitions of dynamic core, and discusses some properties about dynamic core. All of these show that dynamic core possesses the essential characters of the feature core.


Introduction
KNOWLEDGE reduct is an important step in knowledge discovery, and also a favorable method to extract the more generalized rules.There are a number of research papers [1][2][3] about reduct problems, but most of them are just for static reduct.
The reduct methods based on standard rough set are effective to some extent, but there are also some problems to be solved in practice.Dynamic data, incremental data and noise data make the analysis results instable and uncertain.All of these limit the application of rough set theory [1].
Standard rough set methods are not always sufficient for extracting laws from decision system.One of the reasons is that these methods are not taking into account the fact that part of the reduct set is chaotic, in other words it is not stable.
Firstly, it gives the definition of dynamic core.Then the properties of dynamic core is certified.All of these show dynamic core possesses the essential characters of the feature core.

Dynamic Reduct
To the problems of standard rough set reducts or static reducts, dynamic reducts can put up better performance in very large dataset, and also enhance effectively the ability to accommodate noise data.
The rules calculated by means of dynamic reducts are better predisposed to classify unseen cases, because these reducts are in some sense the most stable reducts, and they are the most frequently appearing reducts in sub-decision system created by random samples of a given decision system.
Definition 1: Decision system , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, is called the sub-decision system of .Let denote the set of all sub-decision system of , , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, denotes the set which contains all reducts of decision system , and denotes the set which includes all reducts of sub-decision system .

  D B RE
B There is at least one reduct in a decision system, which is just itself, so the set of reduct is not empty.In many cases, there may be several reducts in a given decision system.Each reduct can product a rule set, and it is difficult to justify which is the best rule set.Therefore it is important to search the most stable reduct, dynamic reduct is proposed in this case.
Definition 3: Decision system , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, [4], and , DR S F is called an F-dynamic reduct, which describes the most stable reducts in decision system.From the definition of dynamic reduct, it follows that a relative reduct of is dynamic if it is also a reduct of all sub-decision system from a given family F by random sampling.
The reducts in a decision system are not stable, sensitive for sample data [5].Bazan gives the concept and method about dynamic reduct [4], which grounds the most stable reduct of decision system in theory, then the dynamic core is put forward in the paper.

Dynamic Core
Attributes reduct is the basic problem in rough set theory, and the computation of feature core is especially important for resolving this problem.All attributes in the feature core will be present in any reduct, otherwise discernible relation in decision system can not be ensured.Many reduct algorithms are based on the feature core.According to the feature core one can construct reduct heuristically, and the efficiency of reduct can be improved greatly.
Many references discuss about the feature core of reduct [2,3], but it is just static reduct.For dynamic reduct the feature core still need to be probed in a further step.
Given decision system   , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, we know that the feature core of decision system in static reduct is , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, , the dynamic core of based on family F is defined by Theorem 1: Decision system , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, , , Theorem 1 means that each attribute in F-dynamic core is included by all F-dynamic reducts.Dynamic reducts is the most stable reducts of decision system , then dynamic core is the most stable core of decision system , which represents a stable unreducted attribute set.

(F-λ)-Dynamic Core
F-dynamic core can be sometimes too much restrictive so here applies a generalization of F-dynamic core.It will be more suitable for noise data.
Definition 5: Decision system , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, , the (F-λ)-dynamic core of decision system based on family F is defined by  is precision coefficient, and the value of  decides which attribute belongs to (F-λ)-dynamic core  approaches 1, will be closed to , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, In this case dynamic core is just the feature core of decision system .S , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, , then the intersection of (F-λ)dynamic reduct contains (F-λ)-dynamic core.
Proof: According to the definition of (F-λ)-dynamic core, it follows

Generalized Dynamic Core
According to the definition of dynamic core, if some feature attributes of any sub-decision system in F family are comprised by dynamic core, then it is certainly a feature attribute of decision system .This notion sometimes can not be convenient because we are interested in useful sets of attributes which are not necessarily reducts of the decision system.Therefore we have to generalize the notion of a dynamic core.

S Definition 6: Decision system
, where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, GDCORE S F is called the generalized dynamic core of decision system .S Definition 7: Decision system , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set, , where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set,   , we have the following propositions. 1)

   , , DCORE S F GDCORE S F  
By definitions 6 and 7 we know that it is obviously true. 2) By 1) it is also obviously true. 3  when F family contains decision system , then generalized dynamic core will be just dynamic core.

S Theorem 5: Decision system
, where is a non-empty, finite set called the universe, is condition attributes set, is decision attributes set,   , then 1) The intersection of F-generalized dynamic reduct contains F-generalized dynamic core Proof: According to the definition of F-dynamic reduct [5], it follows 2) The intersection of (F-λ)-generalized dynamic reduct contains (F-λ)-generalized dynamic core Proof: For any attribute , it sat- It might as well suppose that attribute  exists in core of sub-decision systems, According to the definition of (F-λ)-generalized dynamic reduct [5], , any reduct in (F-λ)-generalize dynamic reduct satisfies: By (1), for any attribute  , we can infer: , .
, it may be true for the hypothesis, but we have fixed 0.5

 
, so it is impossible.

Quality of Family F
We can not calculate the family of all subtables of universal decision information table, therefore we construct family F to estimate universal decision information table.General speaking, if family F contains more elements then the stability of the calculated reduct will be better.
Extracting too much subtables inevitably cause enormous computation.From the thought of Bernoulli's theorem we know that if the extracted subtables is adequate, continuing to extract other subtables has litter relate with reduct stability coefficient.So a minimum value of |F| can be determined.
However the computation of sample of dynamic reduct should emphasize the other important aspect, namely the influence of sampling distribution to the validity of the dynamic reduct.We generally assume that the sample is random.The basic thought of evaluate the quality of family F is proposed.It can be used to weight the equalization of family F and reflect the validity of the dynamic reduct. Definition Taking simple random sampling as a precondition, firstly we will consider the equalization of family F and give two radios to weight it.
1) Entire Sampling Rate The entire sampling rate is denoted by U U takes a too small value then the sample is too centralized.So it can not reflect the character of the original decision information system, and reduce the confidence degree of dynamic reduct.
We think the sample of family F is deficient when the value of F U U less than 0.5.A threshold value can be established.The sample conforms to the requirement and the result of dynamic reduct is effective if F U U is bigger than the threshold value.
2) Entire Balanced Degree The entire balanced degree is denoted by and It is easy to observe that 0 1 U takes a too big value then the sample may too repetitious.So it can not reflect the character of object of the original decision information system, and reduce the confidence degree of dynamic reduct.
We think the sample of family F is unbalanced when the value of C F F U U is bigger than 0.5.A threshold value can be established.The uniformity of sample distribution is better and the result of dynamic reduct is effective if C F F U U is lower than the threshold value.Just evaluating the performance of entire sample is not enough, we should also avoid the abnormal phenomenon of individual decision table in family F. The influence of individual subtables to the distribution must be not too much.For example, we extract 100 subtables.If the radio of central sample set constructed by 99 subtables of them is larger than 0.5, when a subtable is brought into the central sample set, the radio decline sharply.For another example, if the radio of entire distribution is acceptable, but the difference among half of the subtables is small.It indicates the distribution of sampling is not balanced.
We can calculate the degree that central sample set contains each of the subtables and the degree that entire sample set contains each of the subtables.The two parameters can be used to evaluate the equalization of subtables.
3) Partial Sampling Rate The partial sampling rate is denoted by is easy to observe that 0 1 U U of some subtables take a higher value then the sample may be unbalanced.Even the entire balanced degree is permissible, but most of the subtables overlap, this reduces the confidence degree of dynamic reduct.
4) Partial Balanced Degree The partial balanced degree is denoted by C F B U U .
It is easy to observe that 0 1 U U of some subtables takes a higher value then the sample is too centralized.Even the entire balanced degree is permissible, but most of the subtables overlap.

Conclusion
The notion of dynamic core is actually based on dynamic reducts in the paper.Dynamic core consists with the most stable attributes which can not be reducted, and describes the set of feature attributes.At the same time it is proved that the intersection of dynamic reducts contains dynamic core, which is suitable for all kinds of de-finitions about dynamic reducts.All of these show that the dynamic core possesses the essential properties of the feature core in deed.We can say that dynamic core expresses the feature attributes in a more general way.