Area-Correlated Spectral Unmixing Based on Bayesian Nonnegative Matrix Factorization

To solve the problem of the spatial correlation for adjacent areas in traditional spectral unmixing methods, we propose an area-correlated spectral unmixing method based on Bayesian nonnegative matrix factorization. In the proposed method, the spatial correlation property between two adjacent areas is expressed by a priori probability density function, and the endmembers extracted from one of the adjacent areas are used to estimate the priori probability density functions of the endmembers in the current area, which works as a type of constraint in the iterative spectral unmixing process. Experimental results demonstrate the effectivity and efficiency of the proposed method both for synthetic and real hyperspectral images, and it can provide a useful tool for spatial correlation and comparation analysis between adjacent or similar areas.


Introduction
Spectral unmixing is a technique to extract a collection of pure spectra (called endmembers), by decomposing mixed pixels of a hyperspectral image and estimating the corresponding fractions (called as abundances) of these endmembers [1].Most of the existing spectral unmixing methods focus on the hyperspectral images themselves and no extra information is needed in the spectral unmixing procedure.Vertex Component Analysis (VCA) [2], Pixel Purity Index (PPI) [3] and N-FINDR [4] are widely-used typical methods, whose basic idea is to extract the endmembers by identifying the vertexes of the simplex formed by the endmembers.VCA and PPI project the hyperspectral data into a low-dimensional space based on the fact that vertexes of a simplex are still vertexes even if the simplex is projected into a low-dimensional space, while N-FINDR tries to find a fixed number of pixels to minimize the volume of the simplex formed by these pixels.Iterative Error Analysis (IEA) [5] updates every endmember by selecting the pixel with the maximum error in the reconstructed image using the latest endmember set in an iterative process.A new trend is to apply nonnegative matrix factorization to the spectral unmixing, since all of elements in the endmember matrix and the abundance matrix are nonnegative [6,7].These methods mentioned above are characterized by no requirement for extra information, and therefore only a few constraints need be considered when they are implemented.
But in the case of comprehensive analysis, we need also consider some other information besides the hyperspectral image itself, such as spectra from the spectral library and hyperspectral images observed at different time or in different area.If we could bring extra information into the current framework of spectral unmixing, the spatial correlation and comparation analysis between different images can be carried out.Plazza et al. put forward an Automatic Morphological Endmember Extraction method (AMEE) using the correlation of spatial distribution for different substances [8].Zortea et al. propose a pre-processing method on a pixel neighborhood scale to enhance the accuracy of spectral unmixing [9].Zhang focuses on spectral unmixing supported by digital geomorphology model and geographical information system [10].Dobigeon et al. present a semi-supervised spectral unmixing based on a hierarchical Bayesian model [11], where spectra from a spectral library are used in the spectral unmixing procedure.However, endmembers of hyperspectral images observed at different time or in different area may hold not only strong time or spatial correlation but also certain difference at the same time, the problem is how to use these kinds of correlations in the spectral unmixing procedure.
In this paper, we only concentrate on the spatial correlation between hyperspectral images observed in adjacent or similar areas.Our method is based on Bayesian nonnegative matrix factorization [12,13] and the spatial correlation between hyperspectral images observed in adjacent or similar areas is used in the form of a priori probability density function as a constraint to the spectral unmixing procedure.The remainder of this paper is organized as follows: Section 2 gives the detailed description of the proposed method; Section 3 discusses the experimental results both on synthetic and real hyperspectral data; Section 4 is the conclusions.

Area-correlated Spectral Unmixing
A hyperspectral image X can be written in this form: X = SA + E , where S = (s 1 ; s 2 ; ¢¢¢; s P ) 2 R + L £ P is the endmember matrix with the kth endmember represented by s k , A = (a 1 ; a 2 ; ¢¢¢; a P ) T 2 R + P £ N is the abundance matrix in which a k corresponds to the kth endmember and E = (E i ;j ) L £ N 2 R L £ N is the noise matrix with the number of endmembers P and the number of mixed pixels N. The noise E i ;j is assumed to an i.i.d.white Gaussian noise with zero mean and variance ¾ 2 .The objective of spectral unmixing is to find the proper S and A that minimize kX ¡ SA k 2 F with the Frobenius norm k¢k F (or other kinds of norm).
The basic idea of area-correlated spectral unmixing is to use the common endmembers in adjacent or similar areas as an aid in the spectral unmixing procedure.On one hand, they can be used as initial values after proper pre-processing with the current hyperspectral image; on the other hand, we bring them into the iterative spectral unmixing procedure as a priori knowledge.
The whole procedure of our area-correlated spectral unmixing method consists of three steps: pre-processing, Bayesian nonnegative matrix factorization and postprocessing.
In pre-processing, for each endmember extracted from the adjacent area, we search for pixels similar to it in the current hyperspectral image.If the number of these similar pixels is too small, then we abandon this endmember, otherwise, we will calculate the average spectrum of these pixels and put it into the initial endmember matrix.We fill the rest of the initial endmember matrix with random pixels selected from the current hyperspectral image, and the abundance matrix will be also initialized randomly.
The similarity of two spectra is measured by the Spectral Correlation-coefficient Distance (SCD).The SCD of spectrum x and y is defined by Equation (1).

SCD (x ; y )
where x = 1 L P j x j , y = 1 L P j y j and L is the number of bands.
Besides, we need also estimate the number of endmembers.If the estimated number is less than the real one, then it's evident that some endmembers cannot be extracted.To avoid this situation, our strategy is to increase the estimated number of endmembers.
After pre-processing, Bayesian nonnegative matrix factorization with the spatial correlation constraint, as described later in Section 2.2, will be applied to the current hyperspectral data to estimate the endmembers and their abundances.When the spectral unmixing procedure is finished, the extracted endmembers are clustered and the abundances are merged within every cluster in the post-processing stage.

Bayesian Nonnegative Matrix Factorization
Assuming that S , A and ¾ 2 are statistically independent, the Bayesian rule gives rise to One of the possible ways is to find the S and A by maximizing Equation (2).But, due to the complexity of Equation ( 2), the optimization process would be computatively expensive.Instead, in Bayesian nonnegative matrix factorization, Gibbs sampling [14], a sampling method based on Markov Chain Monte Carlo (MCMC) is applied to estimate S, A and ¾ 2 .
Gibbs sampling estimates the model parameters by sampling from the posterior probability density functions of these parameters.These samples will converge to the samples taken from the joint posterior probability density functions of these model parameters.Reference [14] gives the proof of the convergence of Gibbs sampling.The procedure of Gibbs sampling can be described as: in every round of sampling, we sample each parameter in turn; when we sample one of the parameters, we fix the rest and sample from the posterior probability density function of the current parameter; the sample is then used to update the posterior probability density function of the next parameter.Therefore, the posterior probability functions of S, A and ¾ 2 are required.
E i ;j is assumed to an i.i.d white Gaussian noise, that is, where and X = SA .An inverse Gamma distribution is chosen for the noise variance ¾ 2 and the posterior probability density function of the noise variance is given by where k = L N 2 + k 0 and µ = 1 2 P i ;j E 2 i ;j + µ 0 .According to the Bayesian rule, the posterior probability density function of where S k is the remaining matrix after deleting the kth column of the endmembers matrix S , N (x ; ¹ The posterior probability density functions of the endmembers are actually multiple-dimensional Gaussian distribution functions weighted by their priori probability density functions.The next step is to choose proper priori probability functions for the endmembers.We adopt the following distribution as the priori probability density function of the endmember where the vector v k is orthogonal to all the column vectors of S k , 0 and 1 represent all-zero and all-one L-dimensional vectors respectively and ¸ is the parameter for adjusting.One of the reasons we form the priori probability density function of the endmembers in this way is that given a linear combination c of the column vectors of S k , we have v T k c = 0, and therefore the corresponding density value will be zero, which reduces the possibility of exacting a spectrum mixed by the existing endmembers.Another reason is that when projected to , endmembers tend to have larger values of ¯v T k s k ¯ which leads to larger probability density values.By substituting the priori probability density function in Equation ( 7), the posterior probability density function of the endmember is written by As to the abundances, similarly, the posterior probability density function of the abundance a k is p ¡ a k ¯X ; S; A k ; ¾ 2 ¢ _ N (a k ; ¹ a k ; § a k ) p (a k ) (9)  where A k represent the remaining matrix after deleting the kth row of A with the estimation of ¹ a k and § a k i ;k j = 1; 2; 3; ¢¢¢; N : Since the kth endmember is assumed to distribute randomly in the observed area, the priori probability density function of the abundances is modeled as the uniform distribution.Thus, the posterior probability density function of the abundance a k becomes When estimating the model parameters mentioned above, the method of Slice Sampling [15] is used to sample from the posterior probability density functions of these parameters.

Experiments on Synthetic Data
Ten spectra are selected from the spectral library of the United States Geological Survey (USGS) for generating synthetic hyperspectral images.For each pixel in the synthetic hyperspectral image, we first pick several spectra from the ten spectra, and then generate a set of abundances which sum to one, and finally mix the spectra linearly by their corresponding abundances.
Two synthetic hyperspectral images are generated to simulate the hyperspectral images of two adjacent areas.Six spectra are used to generate the first image while the second image has eight, and four spectra are shared between them.The white Gaussian noise is added to each image with the SNR up to 30dB approximately.Firstly, the first image is pre-processed using the Vertex Component Analysis (VCA) method, to extract the endmembers from it as the initial.
Figure 1 shows the result, where the red curves represent the real spectra and the endmembers extracted by our method are represented by the blue curves.Figure 1(a) is the final result of our method, in which the first row corresponds to the four initial endmembers from pre-processing and the second row is the other extracted new endmembers obtained from the spectral unmixing procedure.Table 1 lists the Spectral Correlation-coefficient Distances and the Spectral Angle Distances (SAD) between the real spectra and the extracted endmembers.Figure 1(b) plots the error between the original image and the reconstructed image, where error = kX ¡ SA k 2 F with the Frobenius norm k¢k F .As we can see from Figure 1(a) and Table 1, the exacted endmembers fit the real spectra well.The error between the original image and the reconstructed image decreases sharply with the sampling process approaching, which indicates that our method is able to estimate the endmember matrix and the abundance matrix correctly.
Arngren et al. give another form of priori probability density function with a volume constraint [16].The volume of the simplex formed by the endmembers is used as a constraint in the priori probability density function of the endmember matrix.Figure 2(a) shows the results using the two different priori probability density functions, where the red curves are still the real spectra, the green curves are results under the volume constraint and the blue curves are results of our method.The results of our method fits the real spectra better.Figure 2(b) shows the error comparison between the two methods.At the beginning, the errors of both methods decrease almost at the same pace.With the sampling process going on, the error of our method is dropping faster than the method with the volume constraint.

Experiments on Real Data
Two adjacent square areas of size 100 × 100 are cropped from the AVIRIS data.The VD method [17] and the Hysime method [18] are used to estimate the number of endmembers of both images.Table 2 gives the estimation of both methods showing that the Hysime method gives a larger estimated number.According to [17], the VD method gives a relatively better estimation than other methods when applied to the AVIRIS data.To avoid missing some endmembers, we increase the estimated number of endmembers.But, there are two disadvantages of this strategy.One is that some endmembers may be extracted repeatedly and the other is that mixed spectrum may be extracted.The solution to the first problem is cluster.As to the second problem, one possible solution is considering the abundances while clustering.If a mixed spectrum was extracted, it is most likely that its abundance will be small, which can be used as a clue to discard the poorly extracted endmembers.
Furthermore, our method is based on the fact that there exist some common endmembers between adjacent areas, which means that it should be applied in some certain occasions.For instance, a large hyperspectral image cannot be processed immediately.The large image can be split into several smaller pieces and then our method is applied to the small hyperspectral images.Another application is for areas that are not adjacent geographically yet sharing some common ground surface types.

Conclusions
In this paper, we propose an area-correlated method of spectral unmixing based on Bayesian nonnegative matrix factorization.Common endmembers within adjacent areas are used to assist the spectral unmixing procedure.Experiment results show that these endmembers can serve as a kind of priori knowledge in the procedure to improve the results.It is thought that the proposed method can provide a useful tool for spatial correlation and comparation analysis between adjacent or similar areas.Future researches will focus on investigating other correlations between the hyperspectral images of adjacent areas, optimizing the sampling process and how to use the correlations more efficiently.

Figure 3 (
a) illustrates the extracted endmembers and the corresponding abundances.Five spectra remain after clustering, which consistent with the result of the VD method.

Figure 3 (
b) plots the error between the original image and the reconstructed image, showing that the error declines rapidly in the first few rounds of iteration and approaches zero in the end.

Figure 1 .
Figure 1.Experimental results on the synthetic data, (a)Comparison of the real spectra and the extracted endmembers, (b)Error between the original image and the reconstructed image.

Figure 2 .
Figure 2. Comparison results of two different priori probability density functions, (a)Extracted endmembers of the two methods, (b)Error of the two methods.

Figure 3 .
Figure 3. Experimental results on the real data, (a)Endmembers extracted from AVIRIS data with their abundances, (b)Error between the original image and the reconstructed image.
This work is supported by National Science Foundation (No.61171117) and National Science & Technology Pillar Program (No. 2012BAH31B01) of China.
Gaussian distribution function, and ¹ s k and § s k is the mean vector and the covariance matrix of s k and these parameters are estimated as i = 1; 2; 3; ¢¢¢; L : ,