^{1}

^{*}

^{1}

^{1}

^{2}

Principal component transformation is a standard technique for multi-dimensional data analysis. The purpose of the present article is to elucidate the procedure for interpreting PC images. The discussion focuses on logically explaining how the negative/positive PC eigenvectors (loadings) in combination with strong reflection/absorption spectral behavior at different pixels affect the DN values in the output PC images. It is an explanatory article so that fuller potential of the PCT applications can be realized.

Principal component transform (PCT), also known as eigenvector transformation, Hotelling transformation, Karhunen-Loève (K-L) transformation, eigenvalue-eigenvector decomposition, is a standard and highly powerful technique for processing multidimensional data. This statistical technique aims at reducing variance and dimensionality of the data set by projecting the data along new non-correlated axes. The mathematical principles of the technique are well established [1,2]. The PC transform technique has found extensive applications in almost all disciplines of physical sciences and engineering.

In the field of remote sensing data processing, PCA has been extensively used. Like all other techniques, it also has its set of advantages and disadvantages. The foremost advantage is that it identifies patterns in data. It transforms the data in such a way so as to accentuate its similarities and differences. Also, a PC transform does not lead to any loss of information contained in a dataset; instead it segregates those data bands which are most informative thus saving computational time. Moreover, if all of the axes are kept, then no data is lost as the original data can be recovered from the principal components.

The disadvantages include scene dependence, i.e. if several remote sensing image data sets of a region are acquired at different times or seasons, and even though the character of a certain object may remain unchanged in different scenes, its manifestation in PC images varies depending upon the temporal behavior of other pixels in the scene. Thus, the manifestation of the same object is not related to only intrinsic properties but also varies depending on temporal scene conditions or data in other pixels. Additionally, it is observed that the PC image display is software dependent, i.e. if different software is used for generating PC images from the same input data set, the computed PC image displays may appear different depending upon the way the computations are made in different software (see below). Further, the interpretation procedure for PC images needs more elaboration and clarification especially in terms of effects of different combinations of negative/positive eigenvectors with high/ low input image DN values.

The purpose of the present article is to elucidate the procedure for interpreting PC images. In the field of remote sensing data processing, numerous examples of principal component analysis (PCA) have been described; however, examples of its logical interpretations and applications have been rather limited. In this article, the salient points we intend to elaborate are: what governs the gray tones (DN values) in a PC image, how important are the negative/ positive eigenvectors (i.e. loadings) and how in combination with strong reflection/absorption spectral properties do they affect the DN values in the PC images.

Most commonly PCT is performed on remote sensing data sets using various software packages, and the resulting PC images invariably appear quite different to the input spectral band images. We attempt to logically describe the procedure for interpreting PC images in a simplified manner. This paper is an explanatory material so that fuller potential of the PCT applications can be realized. It draws examples from satellite remote sensing data, and is written mainly for remote sensing data user community; however, it can be conveniently adapted and extended also to various other scientific disciplines.

The paper has been organized into several sections. Firstly, basic concepts and terminology of PCA are described. After this a brief review of earlier studies on applications of PCA in remote sensing is given. This is followed by scope of the present work and illustration of the dependence of PC image displays on software. Finally, an assessment and evaluation of effect of combinations of eigenvectors with DN values of spectral bands (with the help of examples), followed by brief conclusions are presented.

In the simplest terms, PCA is a method of identifying patterns in data. It transforms the data in such a way so as to accentuate its similarities and differences. PCA is defined as an orthogonal linear transformation that re-expresses multivariate data. It extracts and displays the greatest variance in a data on the first axis (called the first principal component), the second greatest variance on the second axis (which is orthogonal to the first) and so on. It is also used to reduce the dimensionality of a data set consisting of a large number of interrelated variables [

PCA does not lead to any loss of information; instead it helps in picking up those data sets or bands which are most informative. Also, additional clarity is provided as signal to noise ratio is improved and inter-band correlations are removed.

Satellite remote sensing digital images are numeric; therefore, their dimensionality can be reduced using PCA [

The various terms used in conjunction with PCA are described below.

Standardized and unstandardized PCA: PCA can be performed in two ways. If the principal components are calculated using the covariance matrix then it is called unstandardized PCA, but if it is calculated using the correlation matrix then it is called standardized PCA.

In multispectral remote sensing, the standardized PCA is reported to have improved signal to noise ratio (SNR) as compared to the unstandardized PCA for the same data set [6,7].

Data space: A multispectral data has a vector space with the number of axes corresponding to the number of spectral components associated with each pixel. For example, Landsat Thematic Mapper (TM) data will have seven dimensions while Terra-ASTER data will have fourteen dimensions. A pixel is plotted as a point in this vector space and its coordinates correspond to the brightness values of that pixel in the appropriate spectral components.

Eigenvectors and Eigenvalues: Let C be an n × n matrix with I as its identity matrix. The eigenvalues of C are defined as the roots of the equation:

Equation (1) is called the characteristic polynomial equation of C and has n roots.

Associated with each eigenvalue is a set of coordinates defining the direction of the associated principal axis [

Thus, the magnitudes of the eigenvalues describe the length and eigenvectors describe the direction of the principal axes.

Pixel value in an image: Pixel value at a point in a particular PC axis image is calculated by multiplying the eigenvector with the corresponding DN value in the original image, and adding all these to get the final pixel value. For example, in an m-band data set, the pixel value at point for the first component (PC-1) is calculated by the following equation:

where is the value of the pixel at row i, column j in the first principal component image, the values are the elements of the first eigenvector for different bands, and is the observed pixel value at location in different bands [

Loadings: They are simple correlations between the pixel values in PC image and the DN values in original image. The correlation estimates the information they share [

Contribution: Contribution is a fractional measure of importance of a particular spectral band for a PC axis. It is computed as square of the eigenvector component for the particular band divided by the sum of squares of all the components of the eigenvector.

Some of the software (e.g. ERDAS, ENVI) normalize the eigenvectors to unit length, such that the sum of squares of every eigenvector of a particular PC axis is always one. Thus, the denominator in the above equation gets normalized to 1 in Equation (4). Then, the percent contribution of a particular band in a PC axis is obtained by multiplying the square of fractional contribution of the band (from Equation (4)) with 100 [

In the field of remote sensing, PCA has been used in various applications like pattern recognition, change detection, data reduction, image data transformation, image classification, image extraction and fusion. It was in 1985 that the concept of standardized and unstandardized PCA in remote sensing was introduced [

In the Kitchener-Waterloo-Guelph area in Canada, PCA was used to calculate the land-cover changes based on both standardized and unstandardized methods. It was reported that standardized PCs provide more accurate information for change detection than unstandardized PCs [

Reference [

The “Crosta technique” was used for mapping alteration minerals using TM and airborne TM imagery of the Great Basin region of western United States. He conducted PC analysis on selectively taken set of spectral bands suited for detecting iron oxide and then on another set of spectral bands for detecting hydroxyl-bearing minerals [

PCA can also be used for characterizing land cover at global scales using long time-series satellite data from the NOAA-AVHRR [

Healthy and bleached coral can also be differentiated using PCA on remote sensing data. The results of PCA were reported to be consistent with those of cluster analysis and spectral derivative analysis in remotely detecting and delineating areas of submerged healthy and bleached corals [

PCA has been used for component separation [

PCA was used as a method of change detection for the Kafue Flats, Zambia, using four scenes of Landsat’s multispectral (MSS) and TM data [

Many authors have used PCA in the past for various applications. A few of those applications are: spatial principal component analysis (SPCA) for analyzing ecoenvironmental vulnerability in upper reaches of Minjiang river in China [

PCA was used as an image enhancement technique to improve the spectral signal of burnt surfaces [

As mentioned earlier, the PCA is a standard technique for multidimensional data analysis. In this presentation, the attempt is to briefly describe the methodology of PC image generation, with the intention of elaborating what governs the gray tone (DN value) at pixels in a PC image. The idea is to particularly discuss how the negative/ positive eigenvectors in combination with spectral behaviour (high/ low DN values) in the input bands affect the final DN-values in the computed PC image. The purpose of this article is to clarify such issues logically, so that the potential of the powerful PCA technique can be fully realized.

These days, frequently software are used for computing PC images. Out of the lot, two very extensively used software in remote sensing data processing are the ERDAS and ENVI and a basic software is MATLAB. For the sake of example, a subscene of ASTER multispectral image data (VNIR—0.52 - 0.86 µm and SWIR—1.60 - 2.43 µm) of Cuprite Mining District (Nevada, USA), was selected for analysis by different software and subsequent comparative analysis.

In ERDAS IMAGINE, PC analysis is done only by covariance matrix, whereas in ENVI, both the options, i.e. employing covariance and correlation matrices exist. In view of the present aim of a comparative analysis, we have used covariance matrix option in both the software.

It is observed that the eigenvectors calculated from ERDAS IMAGINE and ENVI sometimes have mutually inverted relationship, i.e., though the numeric values from the two methods are the same, they differ in sign (positive or negative), e.g. in the case of PC-1, PC-2 and PC-3 (see Tables 1(a) and 1(b)). This results in inversion of the images (bright pixels in ERDAS IMAGINE PC-2 and PC-3 images becoming dark in ENVI PC-2 and PC-3 images, and vice-versa, see Figures 1 and 2). However, in higher order PCs, i.e. PC-4, PC-5 or PC-6, the eigenvector values generated by the two softwares (ERDAS IMAGINE, ENVI) are exactly similar in both the numeric value and sign.

In order to have a better comparative assessment, PC transforms have also been calculated by using MATLAB. It is observed that eigenvectors generated from MATLAB (

Therefore, it is important to realize that the eigenvector sign is dependent on how the multi-dimensional space or mathematical space is conceived. As far as PC image displays are concerned, it is observed that relative change in the sign of eigenvector leads to inversion of gray tone in the PC image. For example, in PC-2 the pixels which are dark in ERDAS IMAGINE are bright in ENVI, and vice-versa, as mentioned earlier. Therefore, the image display is primarily dependent on the way how PC’s eigenvectors are calculated.

It has been discussed above that the PC image is generated as the sum of the products of eigenvectors and DN values for respective spectral bands at each pixel, i.e.

Obviously a larger numeric value of eigenvector (positive or negative) has a higher influence on the PC image. A low numeric value of eigenvector implies that the particular band has relatively low significance for that PC axis. Here we will consider the effects of large (positive or negative) eigenvectors as combined with high or low DN value of the image (which in turn imply strong reflection or strong absorption, respectively). Cases of low values of the eigenvectors are not considered specifically as these would have minimal influence on the resulting PC image.

These examples have been generated using the ASTER image data of Cuprite Mining District (Nevada, USA) and of Indo-Gangetic plains (Uttrakhand, India). Different image data subsets have been used to serve as suitable illustrations.

For an image data subset (