^{1}

^{*}

^{2}

^{*}

^{2}

^{*}

^{2}

^{*}

^{2}

^{*}

For the case where all multivariate normal parameters are known, we derive a new linear dimension reduction (
*LDR*) method to determine a low-dimensional subspace that preserves or nearly preserves the original feature-space separation of the individual populations and the Bayes probability of misclassification. We also give necessary and sufficient conditions which provide the smallest reduced dimension that essentially retains the Bayes probability of misclassification from the original full-dimensional space in the reduced space. Moreover, our new
*LDR* procedure requires no computationally expensive optimization procedure. Finally, for the case where parameters are unknown, we devise a
*LDR* method based on our new theorem and compare our
*LDR* method with three competing
*LDR* methods using Monte Carlo simulations and a parametric bootstrap based on real data.

The fact that the Bayes probability of misclassification (BPMC) of a statistical classification rule does not increase as the dimension or feature space increases, provided the class-conditional probability densities are known, is well-known. However, in practice when parameters are estimated and the feature-space dimension is large relative to the training-sample sizes, the performance or efficacy of a sample discriminant rule may be considerably degraded. This phenomenon gives rise to a paradoxical behavior that [

An exact relationship between the expected probability of misclassification (EPMC), training-sample sizes, feature-space dimension, and actual parameters of the class-conditional densities is challenging to obtain. In general, as the classifier becomes more complex, the ratio of sample size to dimensionality must increase at an exponential rate to avoid the curse of dimensionality. The authors [

Another effective approach to obtain a reduced dimension to avoid the curse of dimensionality is linear dimension reduction (LDR). Perhaps the most well-known LDR procedure for the m-class problem is linear discriminant analysis (LDA) from [

where

where

which is achieved by an eigenvalue decomposition of

Extensions of LDA that incorporate information on the differences in covariance matrices are known as heteroscedastic linear dimension reduction (HLDR) methods. The authors [

Using results by [

where

Moreover, we use Monte Carlo simulations to compare the classification efficacy of the BE method, sliced inverse regression (SIR), and sliced average variance estimation (SAVE) found in [

The remainder of this paper is organized as follows. We begin with a brief introduction to the Bayes quadratic classifier in Section 2 and introduce some preliminary results that we use to prove our new LDR method in Section 3. In Section 4, we provide conditions under which the Bayes quadratic classification rule is preserved in the low-dimensional space and derive a new LDR matrix. We establish a SVD-based approximation to our LDR procedure along with an example of low-dimensional graphical representations in Section 5. We describe the four LDR methods that we compare using Monte Carlo simulations in Section 6. We present five Monte Carlo simulations in which we compare the competing LDR procedures for various population parameter configurations in Section 7. In Section 8, we compare the four methods using bootstrap simulations for a real data example, and, finally, we offer a few concluding remarks in Section 9.

The Bayesian statistical classifier discriminates based on the probability density functions

the Bayes classifier assigns

This decision rule partitions the measurement or feature space into m disjoint regions

One can re-express the Bayes classification as the following:

Assign

This decision rule is known as the Bayes’ classification rule. Let

The Bayes decision rule (4) is to classify the unlabeled observation

The following notation will be used throughout the remainder of the paper. We let

The proof of the main result for the derivation of our new LDR method requires the following notation and lemmas. Let

where

(i)

(ii)

We now state and prove three lemmas that we use in the proof of our main result.

Lemma 1 For

(a)

(b)

(c)

Proof. Part (a) follows from the fact that

Lemma 2 For

Proof. Because

Lemma 3 Let

(b)

(c)

Proof. The proof of part (a) of Lemma 3 follows from (i), and we have that

for each

We now derive a new LDR method that is motivated by results on linear sufficient statistics derived by [

Theorem 1 Let

Finally, let

Proof. Let

where

Recall that for

Therefore, the original p-variate Bayes classification assignment is preserved by the linear transformation

Theorem 1 is important in that if its conditions hold, we obtain a LDR matrix for the reduced q-dimensional subspace such that the BPMC in the q-dimensional space is equal to the BPMC for the original p-dimensional feature space. In other words, provided the conditions in Theorem 1 hold, we have that the LDR matrix

With the following corollary, we demonstrate that for two multivariate normal populations such that

Corollary 1 Assuming we have two multivariate normal populations

Proof. The proof is immediate from (7).

If

One method of obtaining an r-dimensional LDR matrix,

Theorem 2 Let

where

Furthermore,

Using Theorems 1 and 2, we now construct a linear transformation for projecting high-dimensional data onto a low-dimensional subspace when all class distribution parameters are known. Let

is a rank-r approximation of

We next provide an example to demonstrate the efficacy of Theorems 1 and 2 to determine low-dimensional representations for multiple multivariate normal populations with known mean vectors and covariance matrices. In the example, we display the simplicity of Theorem 1 to formulate a low-dimensional representation for three populations

Consider the configuration

We have

Using Theorem 1, we have that an optimal two-dimensional representation space is

The optimal two-dimensional ellipsoidal representation is shown in

We can also determine a one-dimensional representation of the three multivariate normal populations through application of the SVD described in Theorem 2 applied to the matrix M given in (7). A one-dimensional representation space is column one of the matrix F in (8), and the graphical representation of this configuration of univariate normal densities is depicted in

In this section, we present and describe the four LDR methods that we wish to compare and contrast in Sections 7 and 8.

In Theorem 1, we assume the parameters

provided

Thus, using the SVD, we let

with

Provided

A second LDR method presented by [

for

The LDR matrix derived in [

with

The BE LDR approach is based on the rotated differences in the means. The LDR matrix (10) uses a type of pooled covariance matrix estimator for the precision matrices. However, the BE method does not incorporate all of the information contained in the different individual covariance matrices. Another disadvantage of the BE LDR approach is that it is limited to a reduced dimension that depends on the number of classes, m. For

The next LDR method we consider is sliced inverse regression (SIR), which was proposed in [

where

where

The last LDR method we consider is sliced average variance estimation (SAVE), which has been proposed in

[

where

where,

Here, we compare our new SY LDR method derived above to the BE, SIR, and SAVE LDR methods. Specifically, we evaluate the classification efficacy in terms of the EPMC for the SY, BE, SIR, and SAVE LDR methods using Monte Carlo simulations for five different configurations of multivariate normal populations with

For the SY and SAVE LDR approaches, we reduce the dimension to

The five Monte Carlo simulations were generated using the programming language R.

The first population configuration we examined was composed of two multivariate normal populations

Rank | ||||||
---|---|---|---|---|---|---|

Configuration | Means | m | ||||

1 | Moderately separated unequal means | 2 | 3 | 1 | 1 | 2 |

2 | Relatively close unequal means | 3 | 2 | 1 | 1 | 2 |

3 | Relatively close unequal means | 2 | 2 | 1 | 1 | 2 |

4 | Relatively close unequal means | 3 | 2 | 1 | 1 | 2 |

5 | Close unequal means | 2 | 4 | 3 | 2 | 4 |

and

Here,

singular values of

When

dimensional subspaces. Not surprisingly, for

and for

In addition, the SIR LDR method did not utilize discriminatory information contained in the differences of the covariance matrices when

The second Monte Carlo simulation used a configuration with the three multivariate normal populations

Singular value | ||||||||

72.16 | 0.38 | 0.57 | 0.74 | 67.98 | 0.19 | 0.52 | 0.66 | |

16.27 | 0.54 | 11.67 | 0.36 | |||||

12.21 | 0.43 | 8.93 | 0.25 | |||||

9.31 | 0.34 | 6.88 | 0.18 | |||||

7.22 | 0.25 | 5.34 | 0.13 | |||||

5.46 | 0.18 | 4.06 | 0.09 | |||||

3.99 | 0.11 | 2.98 | 0.05 | |||||

2.73 | 0.07 | 2.05 | 0.03 | |||||

1.67 | 0.03 | 1.23 | 0.01 | |||||

0.75 | 0.01 | 0.53 | <0.01 |

and

In this configuration, the population means are unequal but relatively close. Moreover,

As a result of markedly different covariance matrices, the BE and SIR LDR methods are not ideal because both methods aggregated the sample covariance matrices. The SY LDR method, however, attempts to estimate each individual covariance matrix for all three populations, and uses this information that yielded SY as the superior LDR procedure.

For the larger sample-size scenario,

Singular value | ||||||||

15.18 | 1.31 | 0.55 | 1.25 | 14.27 | 1.15 | 0.51 | 1.18 | |

8.92 | 0.90 | 0.11 | 0.61 | 8.89 | 0.47 | 0.06 | 0.43 | |

4.07 | 0.43 | 0.49 | 2.09 | 0.29 | 0.28 | |||

2.51 | 0.40 | 1.50 | 0.21 | |||||

1.84 | 0.32 | 1.21 | 0.16 | |||||

1.49 | 0.25 | 1.01 | 0.13 | |||||

1.22 | 0.19 | 0.84 | 0.10 | |||||

1.00 | 0.14 | 0.70 | 0.07 | |||||

0.79 | 0.10 | 0.56 | 0.05 | |||||

0.58 | 0.06 | 0.42 | 0.03 |

that _{i} = 50 and _{i} = 25,

In this configuration, we have

and

As in Configuration 1, we have that

For

In this situation, we have three multivariate normal populations:

Singular value | ||||||||

27.91 | 0.45 | 0.42 | 0.61 | 22.98 | 0.370 | 0.35 | 0.44 | |

14.84 | 0.49 | 10.48 | 0.28 | |||||

10.73 | 0.39 | 7.69 | 0.21 | |||||

8.12 | 0.30 | 5.86 | 0.15 | |||||

6.18 | 0.23 | 4.50 | 0.11 | |||||

4.65 | 0.16 | 3.39 | 0.08 | |||||

3.41 | 0.10 | 2.50 | 0.50 | |||||

2.41 | 0.06 | 1.78 | 0.03 | |||||

1.55 | 0.03 | 1.16 | 0.01 | |||||

0.76 | 0.01 | 0.60 | <0.01 |

where

and

For the fourth configuration, the covariance matrices are considerably different from one another, which benefits both the SY and SAVE LDR methods. Specifically, the SY LDR procedure uses information contained in the unequal covariance matrices to determine classificatory information contained in

This population configuration illustrated the fact that we cannot always choose

In Configuration 5, we have three multivariate normal populations:

with

Singular value | ||||||||

59.76 | 1.93 | 0.48 | 0.90 | 56.56 | 0.86 | 0.44 | 0.72 | |

36.37 | 0.66 | 0.16 | 0.69 | 37.02 | 0.60 | 0.11 | 0.56 | |

7.72 | 0.31 | 0.52 | 5.36 | 0.16 | 0.31 | |||

5.21 | 0.42 | 3.53 | 0.24 | |||||

3.97 | 0.34 | 2.65 | 0.19 | |||||

3.12 | 0.27 | 2.09 | 0.14 | |||||

2.46 | 0.21 | 1.66 | 0.11 | |||||

1.91 | 0.16 | 1.31 | 0.08 | |||||

1.44 | 0.11 | 1.00 | 0.06 | |||||

0.98 | 0.07 | 0.70 | 0.03 |

and

Here, we have three considerably different covariance matrices. For this population configuration, the SY method outperformed the three other LDR methods for the reduced dimensions

The BE and SIR LDR methods did not perform as well here because of the pooling of highly diverse estimated covariance matrices. While the SAVE LDR procedure was the least effective LDR method for

In the following parametric bootstrap simulation, we use a real dataset to obtain the population means and covariance matrices for three multivariate normal populations. The chosen dataset comes from the University of Califorina at Irvine Machine Learning Repository, which describes the diagnoses of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient in the study is classified into two categories: normal or abnormal. The dataset contains 267 SPECT image sets of patients. Each observation consists of 44 continuous features for each patient. However, for our simulation, we chose only ten of the 44 features. The ten selected features were F2R, F6R, F7S, F9S, F11R, F11S, F14S, F16R, F17R, and F19S. Hence, we performed the parametric bootstrap Monte Carlo simulation with two populations:

Singular value | ||||||||

691.33 | 0.37 | 0.36 | 0.72 | 683.07 | 0.13 | 0.30 | 0.53 | |

71.16 | 0.14 | 0.11 | 0.56 | 50.37 | 0.07 | 0.06 | 0.31 | |

50.13 | 0.08 | 0.46 | 35.78 | 0.04 | 0.24 | |||

39.44 | 0.37 | 28. 39 | 0.19 | |||||

32.07 | 0.30 | 23.19 | 0.15 | |||||

26.27 | 0.24 | 19.10 | 0.12 | |||||

21.43 | 0.18 | 15.67 | 0.09 | |||||

17.22 | 0.14 | 12.13 | 0.07 | |||||

13.41 | 0.09 | 9.89 | 0.05 | |||||

9.59 | 0.06 | 7.08 | 0.03 |

and

For this dataset,

For

Singular value | ||||||||

390.43 | 0.23 | 0.36 | 0.94 | 275.15 | 0.15 | 0.28 | 0.92 | |

198.20 | 0.86 | 175.62 | 0.83 | |||||

118.13 | 0.74 | 111.28 | 0.71 | |||||

74.01 | 0.56 | 69.53 | 0.50 | |||||

52.16 | 0.42 | 47.01 | 0.31 | |||||

37.65 | 0.30 | 35.02 | 0.21 | |||||

25.72 | 0.20 | 24.92 | 0.13 | |||||

16.73 | 0.12 | 15.91 | 0.07 | |||||

9.50 | 0.05 | 8.60 | 0.03 | |||||

4.27 | 0.02 | 3.81 | 0.01 |

In this paper, while all population parameters are known, we have presented a simple and flexible algorithm for a low-dimensional representation of data from multiple multivariate normal populations with different parametric configurations. Also, we have given necessary and sufficient conditions for attaining the subspace of smallest dimension

We have presented several advantages of our proposed low-dimensional representation method. First, our method is not restricted to a one-dimensional representation regardless of the number of populations, unlike the transformation introduced by [

Furthermore, we have derived a LDR method for realistic cases where the class population parameters are unknown and the linear sufficient matrix M in (7) must be estimated using training data. Using Monte Carlo simulation studies, we have compared the performance of the SY LDR method with three other LDR procedures derived by [

Finally, we have extended our concept proposed in Theorem 1 through the application of the SVD given in Theorem 2 to cases where one might wish to considerably reduce the original feature dimension. Our new LDR approach can yield excellent results provided the population covariance matrices are sufficiently different.