_{1}

^{*}

In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.

Recently, much research is aimed at predicting the status of individuals in subjective domains, including their emotional states, their heath, and their personality by using a set of training data, acquired from a variety of sensors and interpreted or labelled by a first-person or a third-person [

Many methods are proposed to deal with label noise. In the literature, there exist three main approaches to take care of label noise [

In order to deal with the above issue about the label noise of training data caused by the subjective labelling, especially in the domain that the ground truth is uncertain, we defined a metric QoSTD that is intended to measure the quality of the training data with uncertain labels for the classification of subjective domains, so as to predict the states of person’s emotion, health, and so on. QoSTD is an aggregation of two components which reflect the ability of clustering and partitioning of the training data set. The training data include the features extracted from the multimodal sensor data of subjects, the subjective scores of various items in a first-person questionnaire, and observation scores of classes in a subjective domain which are provided by third-persons. By using this metric, we can analyze the influence of subjectively labelled data on the quality of the training data, and we can estimate the sufficiency of the training data for the classification. When QoSTD for a particular class is less than a predetermined value, this indicates that the training data for this class can’t satisfy the performance of classification.

We trained binary classifiers for the states based on the support vector machine (SVM) model and the discrimination analysis (DA) model, so as to validate the relation of QoSTD with the performance of classification. Furthermore, the QoSTD is used as weights of predicted class scores to adjust the likelihoods of the instances without the absolute ground-truth. To evaluate the effectiveness of QoSTD in dealing with the label noise brought by subjective labelling, we used TCM Zheng training data set that was used in [

The literature contains many studies on the classification in the presence of label noise [

Moreover, reference [

On the other hand, various methods have been proposed that utilize TCM to infer the health status of an individual as a means of auto-diagnosing. References [

Because the target is the status’ classification in the subjective domains, the data used as training data are generally diverse. For an instance, the data used to extract features or attributes maybe include the data measured by sensors or other equipment, or the data from the first-person questionnaires; with direct observation to the object, the states of the instance are labelled by third-persons for supervised learning. Although the kinds of obtained data are heterogeneous, all of the features extracted from the different modes are handled in the same way as the features of different modes. For example, the histogram, shape, and the texture of an image are the features of the image mode, and the blood pressure measured by a bio-sensor is the feature of bio-sensor mode. All of these features are considered to be homogeneous. They are denoted as a s m n , which are normalized for each data set. Here, s, m, and n indicate indices of the sample, the mode, and a certain mode’s feature. The combined features of all of training samples yield a matrix A with the size of S ∗ M . Here, S, and M are the number of samples and total features, respectively. On the other hand, the labelled state scores from the third-person for each instance are denotedas z s i j , and the values range from 0 to 10. Here, s, i, and j indicate indices of the sample, the observer, and the state, respectively.

Eigen feature vectors of the instance is obtained by calculating the eigenvalues and eigenvectors of A ′ ∗ A ; this is based on the method of principle component analysis (PCA) [

E F = { e f s , p : s ∈ [ 1 , S ] , p ∈ [ 1 , P ] } = A ∗ U (1)

where s and p are indices indicating the sample and the eigenfeature, respectively. Thus, the size of E F is S ∗ P .

The eigenfeature vector is then used to represent the instance. The samples belonging to a given state and those not belonging to that state are considered to overlap due to the subjectivity of the labelling. Accordingly, a matrix called QoSTD is defined to measure how well the training data set can be divided into binary classes. This allows us to explore the influence of the features and the subjectively labelled data on the state that is perceived. QoSTD is calculated not only based on the partition of the training data, but also the clustering ability of those. We call these two metric as the partition and the clustering. These determine the performance of the classification regarding the training data. Let the score of State j for Samples labelled by Observer i be denoted as z s i j . In the training data set, those that have scores larger than the value of 0 for state j are considered being labelled as state j, and compose the data set P Z i j = { z s i j : z s i j ≥ 0 , s ≤ S } , and those that have a score of 0 for State j compose the data set N Z i j = { z s i j : z s i j = 0 , s ≤ S } . We used K-means clustering to divide the data set into two groups. One of them with the more samples labelled by State j isassumed as the positive cluster of State j, denoted as P C i j = { p c s i j , s ≤ S } , and the other is the negative cluster, denoted as N C i j = { n c s i j , s ≤ S } . Accordingly, the partition of the data set for State j labelled by Observer i is defined as

p a r i j = # ( P Z i j ∩ P C i j ) # P C i j , (2)

and the clustering of the data set for State j labelled by Observer i is defined as

c l u i j = # ( P Z i j ∩ P C i j ) # P Z i j , , (3)

where # indicates the number of data points; # ( P Z i j ∩ P C i j ) indicates the number of the samples which are labelled as State j and clustered into the positive cluster of State j. So, the larger the values of p a r i j and c l u i j is, the better the separability of the training data set for State j is. If these values are equal to 1, this means that the training data are completely separable. Accordingly, the quality of the training data set for classifying State j labelled by Observer i is defined as Q o S T D i j by the following expression, which is an aggregation of p a r i j and c l u i j :

Q o S T D i j = w 1 p a r i j + w 2 c l u i j . (4)

Here, w 1 and w 2 are the weights of partition and clustering, reflect the importance of the partition and the clustering ability of the training data in the classification. In the case that these two factors are equivalently important, both are set to 0.5. The value of Q o S T D i j is equal to 1, if the training data set is completely separable for the Sate j which are labelled by Observer i.

Based on the definition of QoSTD, and combining the results of instances’ clustering in

As mentioned above, the metric Q o S T D i j could be used to judge the quality of training data for the classification. Accordingly, the value of Q o S T D i j is considered to be used as a weight of the predicted class scores of the instances regarding States j.

For calculating Q o S T D i j , the data modes used as the training set are determined based on the context in which the data were collected and the capacity for computations. Next, the features of matrix Aare extracted from the multimodal data set. The eigenfeatures matrix EF is obtained by Equation (1). Then, the value of Q o S T D i j for State j labelled by Observer i is calculated using Equations (2), (3), and (4).

The following is the scheme that trains classifiers utilizing Q o S T D i j .

Generally, existing supervised learning algorithm, for example, discrimination

analysis (DA), support vector machine (SVM), or decision tree (DT), could be utilized to train binary classifiers of State j with the training data labelled by observer i. With using the trained classification model, the predicted class score belonging to State j is generated for the response to the instance s, which is denoted as s c o r e s i j . However, considering that the quality of training data influences the performance of classification, Q o S T D i j is utilized as a weight of s c o r e s i j to adjust the scores of prediction. The corresponded computation is as the below.

s c o r e _ r s i j = Q o S T D i j ∗ s c o r e s i j (5)

where, s c o r e _ r s i j denotes the adjusted score of instance s belonging to State j labelled by observer i.

Then, the likelihood of the instance belonging to State j is calculated by the Equation (6).

l r s i j = 1 1 + exp ( − a ∗ s c o r e r s i j ) (6)

where, l s i j indicates the likelihood of instance s belonging to State j labelled by observer i; the parameter a is the slope parameter.

For the instance s, if the value of l _ r s i j is more than a threshold T_max, it is assigned to the positive lass of State j; if that value is less than a threshold T_min, it is assigned to the negative class of State j; otherwise, whether the instance is belonged to State j is uncertain. Then, the uncertain instances are eliminated from the training data set, and the classification model is trained again with the refined training data.

Two measurements, C o n i j and R e c a l l i j , are introduced to assess the performance of classifying the classes without the absolute ground-truth. C o n i j , which is defined by the following Equation (7), reflects the consistency of the labelled score of the assigned instances from the training data with the likelihood of those. Let the labelled score that is larger than the value of 0 is denoted as p z s i j , the likelihoods of the assigned instance is denoted as l _ r a s i j , and the number of the assigned instances is denoted as S 1 i j , Then,

C o n i j = ∑ s = 1 S 1 i j ( p z s i j − p z ¯ s i j ) ( l _ r a s i j − l _ r a ¯ s i j ) ‖ p z s i j − p z ¯ s i j ‖ 2 ‖ l _ r a s i j − l _ r a ¯ s i j ‖ 2 (7)

On the other hand, R e c a l l i j , which is defined by Equation (8), reflects the rate of the number of the assigned instances to the all.

R e c a l l i j = S 1 i j S i j (8)

It is obvious that the larger the values of C o n i j and r e c a l l i j are, the better the performance of the classifiers for the classification.

Let the object value of C o n i j is C o n _ O b j i j , and that of R e c a l l i j is R e c a l l _ O b j i j . Then, the whole training procedure is as the below.

Step 1

Constructing the binary classification model;

Step 2

Calculating the adjusted likelihood l _ r s i j of instances by Equations (5) and (6);

Step 3

If T _ m i n < l _ r s i j < T _ m a x , the instance s is not assigning to State j; otherwise, the instance s is assigned to the positive or negative class of State j according to the likelihood;

Step 4

Eliminating the unassigned instances from the training data set;

Step 5

Calculating C o n i j and R e c a l l i j by Equations (7) and (8), and repeating the procedure from Step 1 to Step 4, until the limited rounds or R e c a l l _ O b j i j are reached;

Step 6

Finding the maximal value of C o n i j , and the corresponding round. The binary classification model constructed in this round is used as the final model, if this is larger than C o n _ O b j i j , and the corresponding R e c a l l i j is larger than R e c a l l _ O b j i j ; if not, the final classification model can’t be determined, and the training procedure is given up.

After constructing the binary classification model, a new instance could be assigned to the positive class of State j, if its l _ r s i j is larger than T_max; however, it is assigned to the negative class of State j, if its l _ r s i j is less than T_min. Otherwise, which the instance is belonged to is uncertain.

In this study, the real-world training data set that was used in [

The extracted features were combined with the above feelings and physical states to form the matrix A. Each of these items is the modes of the features. The training data set includes five modes: Feelings, Physical States, Eye, Tongue, and Face. The modes and the number of features for each mode are shown in

The matrix EF of eigenfeature vectors of the instances is obtained by Equation (1) with calculating the eigen values and eigen vectors of A ′ ∗ A . Then, the matrix EF is used to train the binary classifiers of TCM Zhengs.

Although two kinds of classification models are trained In order to verify the above statement that Q o S T D i j can be utilized as the weight of the predicted class score to improve the performance of the classifiers, especially, in the case that the training data are subjectively labelled and the ground truth is uncertain, all of existing supervised classification models are available. For these two kinds of classifiers, one is SVM model that is trained by utilizing the MATLAB (Mathworks, Natick, MA, USA) function fitcsvm. The kernel function here is a polynomial of order three. The other is DA model that is trained by MATLAB function fitcdiscr.

Based on the above binary classification model, the class scores of the instances belonging to Zheng j are obtained by using the MATLAB function predict. Then, the class scores are used to calculate the likelihood measures of the corresponding instances by Equations (5) and (6). For the SVM model, the value of parameter a in the Equation (6) is set as 1/100, and for the DA model, the value of that is set as 50. Moreover, T_max = 0.99, and T_min = 0.01. For a instance s, it is assigned to Zheng j, if the calculated likelihood is larger than 0.99;

Symbol | M_{1} | M_{2} | M_{3} | M_{4} | M_{5} |
---|---|---|---|---|---|

Mode | Feeling | Physicals | Eye | Tongue | Face |

Features | 9 | 15 | 9 | 20 | 18 |

if the likelihood is less than 0.01, the instance s is not belonged to Zheng j; otherwise, the assigned post of the instance s is uncertain. The training procedure is repeated with the refined training data set, until the limited rounds or R e c a l l _ O b j i j are reached.

From

As described in Section 3, the predicted class scores of the instances are adjusted by introducing Q o S T D i , j as the weights of those scores. For exploiting how adjusting the predicted class scores improve the performance of the classification, another measurement that reflects the consistency of the labelled score of the assigned instances from the training data with the likelihoods of those in the case without adjusting the class scores is introduced. This measurement C o n _ o r i i j is calculated by the Equation (9).

C o n _ o r i i j = ∑ s = 1 S 1 i j ( p z s i j − p z ¯ s i j ) ( l _ a s i j − l _ a ¯ s i j ) ‖ p z s i j − p z ¯ s i j ‖ 2 ‖ l _ a s i j − l _ a ¯ s i j ‖ 2 (9)

where, l _ a s i j indicates the likelihoods of the assigned instances in the case that the class scores are not adjusted by the Equation (5).

TMCD1 | DA | SVM | |||||
---|---|---|---|---|---|---|---|

Zheng | QoSTD | Con_ori | Con | increased rate | Con_ori | Con | increased rate |

1 | 0.52 | 0.16 | 0.39 | 1.47 | 0.46 | 0.49 | 0.06 |

2 | 0.62 | 0.82 | 0.83 | 0.02 | 0.80 | 0.79 | 0.00 |

3 | 0.69 | 0.84 | 0.85 | 0.02 | 0.81 | 0.81 | 0.00 |

4 | 0.78 | 0.82 | 0.83 | 0.00 | 0.82 | 0.82 | 0.00 |

5 | 0.52 | 0.10 | 0.16 | 0.50 | 0.45 | 0.45 | 0.00 |

6 | 0.61 | 0.63 | 0.60 | −0.04 | 0.63 | 0.63 | 0.01 |

7 | 0.46 | 0.23 | 0.36 | 0.59 | 0.44 | 0.46 | 0.06 |

8 | 0.63 | 0.83 | 0.83 | 0.00 | 0.83 | 0.83 | 0.00 |

9 | 0.59 | 0.77 | 0.82 | 0.06 | 0.70 | 0.70 | 0.00 |

10 | 0.43 | 0.01 | 0.10 | 13.40 | 0.29 | 0.37 | 0.30 |

11 | 0.74 | 0.90 | 0.90 | 0.00 | 0.91 | 0.91 | 0.00 |

12 | 0.75 | 0.81 | 0.82 | 0.00 | 0.81 | 0.81 | 0.00 |

13 | 0.41 | 0.10 | 0.31 | 2.05 | 0.53 | 0.55 | 0.03 |

TMCD3 | DA | SVM | |||||
---|---|---|---|---|---|---|---|

Zheng | QoSTD | Con_ori | Con | increased rate | Con_ori | Con | increased rate |

1 | |||||||

2 | 0.36 | 0.12 | 0.39 | 2.27 | 0.49 | 0.50 | 0.04 |

3 | 0.50 | 0.31 | 0.32 | 0.04 | 0.66 | 0.66 | 0.00 |

4 | 0.35 | 0.19 | 0.33 | 0.77 | 0.45 | 0.50 | 0.10 |

5 | 0.44 | 0.31 | 0.39 | 0.23 | 0.39 | 0.42 | 0.07 |

6 | 0.45 | 0.15 | 0.24 | 0.66 | 0.27 | 0.29 | 0.07 |

7 | 0.53 | 0.59 | 0.67 | 0.12 | 0.62 | 0.62 | 0.00 |

8 | 0.53 | 0.67 | 0.45 | −0.33 | 0.68 | 0.68 | 0.00 |

9 | 0.41 | 0.37 | 0.07 | −0.81 | 0.42 | 0.47 | 0.12 |

10 | 0.49 | 0.00 | 0.02 | 106.00 | 0.36 | 0.38 | 0.06 |

11 | 0.53 | 0.77 | 0.79 | 0.02 | 0.69 | 0.69 | 0.00 |

12 | 0.41 | 0.24 | 0.46 | 0.92 | 0.53 | 0.55 | 0.05 |

13 | 0.42 | 0.28 | 0.38 | 0.39 | 0.50 | 0.52 | 0.05 |

labelled by the TCM doctor identified as 3 (TCMD3). For

From the results of

Moreover, it is observed that the most of C o n i j are larger with SVM modal compared with DA model; however, the most of increased rates of C o n i j to C o n _ o r i i j are larger with DA model compared with SVM model. This means that the DA-based classifiers are more sensitive to Q o S T D i j than the SVM- based classifiers, although the SVM-based classifiers seem to have the better classification ability.

Accordingly, we can say that adjusting the predicted class scores with Q o S T D i j as the weights really improves the performance of the classifiers trained, especially in the cases that the classifiers are trained by the data set that is with the low values of Q o S T D i j , whatever the classification models which are used to train the classifiers.

As described in Section 4, the training produce is repeated for constructing the classifiers with the refined training data set until the limited rounds or R e c a l l _ O b j i j are reached. In our experiments, the limited rounds is set as 20, C o n _ O b j i j is set as 0.7, and R e c a l l _ O b j i j is set as 0.1.

From the results of

TMCD1 | DA | SVM | |||||
---|---|---|---|---|---|---|---|

Zheng | QoSTD | max | max-1th | max order | max | max-1th | max order |

1 | 0.524 | 0.391 | 0.000 | 1 | 0.553 | 0.067 | 4 |

2 | 0.623 | 0.835 | 0.005 | 2 | 0.798 | 0.004 | 9 |

3 | 0.685 | 0.850 | 0.000 | 1 | 0.815 | 0.003 | 2 |

4 | 0.777 | 0.827 | 0.000 | 2 | 0.818 | 0.000 | 1 |

5 | 0.521 | 0.156 | 0.000 | 1 | 0.500 | 0.053 | 3 |

6 | 0.610 | 0.668 | 0.065 | 2 | 0.634 | 0.002 | 2 |

7 | 0.464 | 0.361 | 0.000 | 1 | 0.577 | 0.113 | 7 |

8 | 0.633 | 0.833 | 0.001 | 2 | 0.834 | 0.000 | 1 |

9 | 0.594 | 0.819 | 0.004 | 2 | 0.700 | 0.004 | 2 |

10 | 0.429 | 0.098 | 0.000 | 1 | 0.577 | 0.206 | 10 |

11 | 0.743 | 0.903 | 0.000 | 1 | 0.909 | 0.000 | 1 |

12 | 0.750 | 0.818 | 0.001 | 2 | 0.812 | 0.000 | 1 |

13 | 0.410 | 0.310 | 0.004 | 2 | 0.616 | 0.070 | 10 |

TMCD3 | DA | SVM | |||||
---|---|---|---|---|---|---|---|

Zheng | QoSTD | max | max-1th | max order | max | max-1th | max order |

1 | |||||||

2 | 0.36 | 0.895 | 0.503 | 4 | 0.590 | 0.085 | 6 |

3 | 0.50 | 0.772 | 0.511 | 3 | 0.851 | 0.187 | 6 |

4 | 0.35 | 0.328 | 0.000 | 1 | 0.689 | 0.193 | 5 |

5 | 0.44 | 0.707 | 0.322 | 5 | 0.622 | 0.207 | 5 |

6 | 0.45 | 0.244 | 0.000 | 1 | 0.633 | 0.340 | 5 |

7 | 0.53 | 0.828 | 0.159 | 2 | 0.794 | 0.175 | 6 |

8 | 0.53 | 0.897 | 0.448 | 5 | 0.858 | 0.181 | 6 |

9 | 0.41 | 1.000 | 0.931 | 3 | 0.797 | 0.328 | 6 |

10 | 0.49 | 1.000 | 0.979 | 3 | 0.766 | 0.388 | 6 |

11 | 0.53 | 0.878 | 0.120 | 2 | 0.759 | 0.072 | 6 |

12 | 0.41 | 0.884 | 0.425 | 5 | 0.777 | 0.223 | 6 |

13 | 0.42 | 0.918 | 0.536 | 5 | 0.811 | 0.289 | 6 |

of C o n i j do not reach the value of C o n _ O b j i j that is set as 0.7, although they raise after re-training. It is indicated that the provided training data set cannot make the corresponding classifiers achieve the required performance for these Zhengs. In such cases, constructing the classifiers for these states should be given up.

It is also noted that the round of re-training that make C o n i j maximal is different. For example, for Zheng 2 in the case of TCMD1, the maximal C o n 1 , 2 is 0.84, and the corresponding round is 2th round by DA model; the maximal C o n 1 , 2 is 0.80, and the corresponding round is 9th round by SVM model. For Zheng 2 in the case of TCMD3, the maximal C o n 1 , 2 is 0.90, and the corresponding round is 4th round by DA model; the maximal C o n 1 , 2 is 0.60, and the corresponding round is 6th round by SVM model. It is obvious that this matches the issue described in [

As a whole, we used the real-world training data set to train the classification models. This training data set involved in thirteen Zhengs labelled by TCM doctors with the label noises that were caused because the absolute ground-truth is unknown, while the current research in the literature almost utilizes the artificial corrupted training data set. The experiments verified that the Q o S T D i j is relevant to the performance of classifying the classes without the absolute ground- truth. There is the high positive correlation between Q o S T D i j and C o n i j , introducing the measurement of Q o S T D i j as the weights of the predicted class scores to adjust the likelihoods of the instances really improved the performance of the classifiers.

This paper defined the Q o S T D i , j metric as a way to measure the quality of training data subjectively labelled by observers (i), which was used to improve the prediction of states (j) without the absolute ground-truth. The Q o S T D i j was used as the weights of the predicted class scores to adjust the likelihoods of the instances. Moreover, two measurements of C o n i j and R e c a l l i j were defined in order to assess the performance of the classifiers trained by the subjective labelled data in a more suitable way. The training procedure was repeated by the refined training data set, until the object values of C o n i j and R e c a l l i j were reached.

For verifying the effectiveness of the proposed method, real-world training data set was used to train the classifiers based on the DA and SVM classification models. This training data set involved in thirteen Zhengs labelled by TCM doctors with the label noises that was caused because the absolute ground-truth is unknown. The experimental results showed the effectiveness of the proposed method in improving the performance of the classifiers for the instances without the absolute ground truth. Furthermore, the proposed method indicated the solution how to eliminate the instances with the label noises from the training data set.

As an area of future work, we intend to utilize the other training data set in the field of emotion, personality, and so on, to train the classifiers based on the proposed method, to verify the effectiveness of our method in improving the classification in the subjective domains.

This work was supported by research funds from Iwate Prefecture University. The author would like to thank Prof. Shaozi Li and Prof. Feng Guo in Xiamen University for their cooperation in data collection, system implementation, and experiments.

Dai, Y. (2017) Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains. Journal of Computer and Communications, 5, 152-168. https://doi.org/10.4236/jcc.2017.57014