Attribute Reduction Method Based on Sequential Three-Branch Decision Model ()
1. Introduction
Rough set theory [1] is an analytical theory for handling uncertain information, and attribute reduction [2] [3] , as a hotspot issue in rough set theory, aims to delete redundant attributes as much as possible while ensuring the ability of the attribute set to divide the domain remains unchanged. In recent years, research on attribute reduction has generally been divided into exhaustive methods and heuristic methods. Exhaustive methods, although able to obtain the final reduction result, are not suitable for large-scale information processing due to their high complexity. Heuristic methods, on the other hand, have been favored by many scholars for their high efficiency in time reduction through the use of greedy search strategies. For example, methods such as Hu [4] proposed a forward greedy attribute reduction algorithm based on attribute importance; Liu [5] improved the traditional attribute reduction algorithm proposed by Hu and others and introduced the FHARA algorithm.
Three-branch decision [6] represents a divide-and-conquer approach based on the decision rough set [7] theory, where the idea is to divide the whole into three parts and process each part separately. Yao further utilized the progressive granularity calculation concept to construct a sequential three-branch decision model [8] , which could enhance processing efficiency, reduce processing costs, and is more suitable for solving complex problems. Several scholars have conducted research on the sequential three-branch decision model, for example, Ju [9] and others proposed a sequential three-branch classifier for local reduction starting from a local perspective.; Hu et al. used the artificial fish swarm algorithm to automatically determine the three-branch decision thresholds [10] ; Sheng et al. proposed an attribute reduction method based on the sequential three-branch decision model [11] ; Jiang et al. introduced an accelerated attribute reduction method utilizing the sequential three-branch decision concept [12] .
The Three-Branch Decision involves dividing the domain U into positive region, negative region, and boundary region, and then proposing corresponding decision strategies. Literature [11] [12] utilizes the Three-Branch Decision concept to directly divide attributes into positive region, negative region, and boundary region. In this paper, building upon the ideas in references [11] [12] , a novel attribute reduction method based on the sequential three-branch decision model is presented, where attributes are divided into core attributes, relatively necessary attributes, and unnecessary attributes using attribute importance and thresholds. This new method can exclude the influence of irrelevant attributes during the reduction process to enhance reduction efficiency and reduce time consumption. The structure of the paper is as follows: the first part reviews some basic concepts of three-branch decision and attribute reduction; the second part presents the attribute reduction method based on the sequential three-branch decision model; the third part compares this method with traditional heuristic attribute reduction algorithms; and the fourth part provides a summary.
2. Preliminary Knowledge
To better assist the research work, this section introduces some basic knowledge of three-branch decision, attribute reduction methods, and other related concepts.
2.1. Three-Branch Decision
Building upon rough set research, Yao proposed the theory of Three-Branch Decision as a common strategy for solving complex problems. In practical decision-making processes, quick judgments can be made on matters that are to be accepted or rejected. For matters that cannot be decided immediately, judgment is often delayed, leading to deferred decision-making.
Definition 1 [6] : Given a subset X of the domain U in the information system S, denoted as
, the conditional probability that an object x,
, belongs to R under the equivalence relation X in the universe U is represented as:
(1)
Definition 2 [6] : the three-branch areas of the three-branch decision-making are respectively defined given the threshold value
:
(2)
The calculation of the threshold
follows the Bayesian decision criterion. The calculation formula is as follows when the expected loss function is minimum and satisfies
:
(3)
2.2. Attribute Reduction
Attribute reduction, as a core issue in rough set theory research, involves the idea of removing redundant attributes while maintaining the classification ability of an information system, in order to extract key attributes and simplify the information system. In order to represent attribute reduction based on dependency functions using information measures, Hu defined the information entropy under a decision system.
Definition 3 [4] : Given a decision system S,
,
, the conditional information entropy of D with respect to B is defined as:
(4)
where
represents a fuzzy equivalence class,
represents a decision class, and for
, B is a reduction based on dependency functions if and only if
and
.
In order to select the decision attribute set B, the definition of attribute importance is provided, indicating the importance of each attribute to decision D.
Definition 4: Given a decision system S, for
, the relative attribute importance of a with respect to B is defined as:
(5)
If
, it indicates that attribute a does not contribute to the decision, meaning that attribute a is redundant.
3. Attribute Reduction Method Based on Sequential Three-Choice Decision Model
In order to reduce the time consumption during the attribute reduction process, this paper proposes an attribute reduction method based on a sequential three-choice decision model. The core of this method lies in combining the sequential three-choice decision thinking, dividing attribute sets based on attribute importance and thresholds. For attributes in the relatively necessary set (
), further divisions are made. Some irrelevant attributes are excluded while selecting attributes, reducing the number of candidate attributes during iteration to decrease time consumption.
In the sequential three-choice decision model, attributes are divided into core attributes (
), relatively necessary attributes (
), and unnecessary attributes (
) based on attribute importance. Initially, all attributes are traversed for their importance, and then divided into
according to their importance. Formula 3 defines a pair of thresholds
. Attributes with an importance value
are classified into
, indicating their acceptance into the decision attribute set; those with
are classified into
, indicating their rejection from the decision attribute set; and attributes with
are classified into
, indicating the need for further evaluation. This process is repeated until the final decision attribute set is obtained.
The acceptance, rejection, and delay decision behaviors in the three-choice decision theory correspond to the positive region, negative region, and boundary region in rough set theory, as shown in Figure 1. To visually observe the situation of attribute reduction, the construction process of the sequential three-choice decision model is provided in Figure 2 (where Figure 1 shows the division of the domain, and Figure 2 shows the division of attributes using the sequential thinking).
Definition 5: Given a decision system S. For the i layer, a set of thresholds
is provided, and the division process is as follows:
(6)
Proposition 1: Under the conditions given in Definition 5, we have
.
, that is, as i increases,
increases and
decreases, leading to an incremental increase in
, and a gradual decrease in
.
Figure 2. Sequential three-way decision model based on attribute importance.
Proof: Given a decision information system S. Based on the Bayesian decision criterion, selecting the threshold pair
that minimizes the expected loss function and satisfies the condition
, we obtain the threshold pair for
.
(7)
According to the threshold values
, the attributes in
are divided into
, and then the threshold values
of
are calculated using formula (3), and so on. Based on formulas (1) and (3), it can be determined that
and
. By mathematical induction, it can be concluded that
and
. Obviously, in the sequential three-way decision process,
and
are incrementally added.
In the above process, attribute importance is used to transform
for many times, and threshold
is given, then the division process is as follows:
(8)
then
(9)
(10)
In the mentioned sequential three-choice decision model, attributes in
are continuously divided, with useful attributes assigned to
and redundant attributes assigned to
, until the final attribute set is obtained. The final division results are as follows:
(11)
Example 1: Given a neighborhood decision information system
. Where
,
,
. According to Definition (3), assuming the values of the threshold pair
obtained based on the Bayesian criterion are
. The decision information Table 1 is shown as follows.
According to the decision attribute D, the domain U is divided into
,
,
. The conditional information entropy:
(12)
obtains
,
,
. According to the formula
, the importance of the conditional attributes
, and a4 is obtained:
,
,
,
. Calculating the threshold pair
for
using formula (3) yields the values (0.7, 0.3), indicating that attributes a1 and a3 can be considered redundant in the reduction process:
4. Experimental Analysis
In order to verify the effectiveness of the SAR algorithm proposed in this paper, 8 sets of UCI datasets (all downloaded from the UCI Machine Learning Database) were selected for experiments. A related comparative experiment on classification accuracy with the HAR, MSM-3WDAR, 3WDAR classification algorithms was designed. The experimental data set information is shown in Table 2. The experimental environment for this paper is a Windows 10 system, with an Intel(R) Core(TM) i5-6300HQ CPU @2.30GHz computer, and the programming language used is MATLAB R2021a.
This paper utilized a 5-fold cross-validation method, selecting 10 different radii ranging from
. The experimental data was divided into 5 equal parts, namely
. The first iteration used
as the training set to obtain the reduction
, and
was used as the test set to calculate the classification accuracy using the attributes in
. The second iteration used
as the training set to obtain the reduction
, and
was used as the test set to calculate the classification accuracy using the attributes in AT2. This process continued iteratively.
4.1. Comparison of Reduction Time
During the experiment, the time consumption of reduction using the 3SAR, HAR, MSM-3WDAR, and 3WDAR classification algorithms was recorded, and the comparison results are shown in Table 3.
Table 1. Decision information system.
Table 3. Comparison of reduction time among the four algorithms.
The time complexity of 3SAR is
, in the iterative process, irrelevant attributes will be excluded, which will reduce the invalid calculation in the attribute reduction process, so the reduction time consumption will be reduced. The experimental results show that the new algorithm 3SAR, which integrates three decision-making ideas, consumes less reduction time than HAR, MSM-3WDAR, and 3WDAR for most radii.
4.2. Comparison of Reduction Classification Accuracy
To illustrate the effectiveness of 3SAR in classification accuracy, experiments were conducted on the samples of the 3SAR, MSM-3WDAR, and 3WDAR algorithms in the test set. The experimental results are as follows (see Figure 3).
The experimental results indicate that, for most radii, the classification accuracy obtained by 3SAR is superior to HAR, MSM-3WDAR, and 3WDAR. Furthermore, by comparing the classification accuracy of the reduction results, 3SAR demonstrates better classification performance compared to HAR, MSM-3WDAR, and 3WDAR. Additionally, 3SAR consumes less time. Therefore, based on the sequential three-decision model, the attribute reduction method effectively reduces the time consumption while ensuring the algorithm’s classification performance. This demonstrates that the new algorithm 3SAR has classification accuracy not inferior to traditional algorithms.
Figure 3. Comparison of accuracy among the four algorithms at different radii.
5. Conclusion
This paper proposes an attribute reduction method based on a sequential three-decision model. Building upon traditional heuristic attribute reduction methods, the importance of attributes is used as an evaluation function to determine the reduced attribute set. By utilizing attribute importance and thresholds to categorize attributes into core, relatively necessary, and unnecessary attributes, this new method is experimentally compared with HAR, MSM-3WDAR, and 3VDAR. The results show that the new algorithm improves the efficiency of reduction while maintaining reduction performance and lowering time consumption.
Funding
This work was supported by the National Natural Science Foundation of China under grant numbers 12371459.
NOTES
*Corresponding author.