Spectral Clustering with Eigenvalue Similarity Metric Method for POL-SAR Image Segmentation of Land Cover

A simple and fast approach based on eigenvalue similarity metric for Polarimetric SAR image segmentation of Land Cover is proposed in this paper. The approach uses eigenvalues of the coherency matrix as to construct similarity metric of clustering algorithm to segment SAR image. The Mahalanobis distance is used to metric pairwise similarity between pixels to avoid the manual scale parameter tuning in previous spectral clustering method. Furthermore, the spatial coherence constraints and spectral clustering ensemble are employed to stabilize and improve the segmentation performance. All experiments are carried out on three sets of Polarimetric SAR data. The experimental results show that the proposed method is superior to other comparison methods.


Introduction
The fully-polarimetric synthetic aperture radar (SAR) [1] [2] [3] has the ability to provide information in four channels HH, HV, VH and VV, and contains complete polarization information of electromagnetic waves' effect on surface.
We can obtain the information of land cover by analysis and interpretation of the polarization information.Segmentation for land cover is one of basic issues and also important applications.
The segmentation process is based on the choice of features and classifier.Selecting good features can obtain better segmentation result than only improving classifier.In the existing segmentation of polarimetric SAR, what is generally used as features is polarimetric information, and the texture or gray information of image [4] [5].The prior work has shown that the eigenvalues of coherency matrix include rich polarimetric information with good divisibility [6].At the same time, the eigenvalue, which is obeying Gaussian distribution approximately, is easy to be metric.
The existing classifier can be divided into the supervised and the unsupervised [7] [8].Compared with the unsupervised clustering, the supervised needs partially landcover labels, thus can often get better segmentation result [9] [10].However, due to the landcover label of polarimetric SAR image is difficult to obtain, more researchers focus on the unsupervised cluster methods [11]- [19].
Cloude et al. have previously used the threshold of scattering entropy, scattering angle and inverse entropy to classify [11] [14] [15].Freeman et al. have extracted three scattering powers and classified according to the proportion [17].Lee et al. have proposed the maximum likelihood classifier based on complex Wishart distribution [18].The above unsupervised clustering methods are faced up the choice of threshold that needs much artificial experience which is cost.So we choose the real unsupervised clustering method as classifier.
Clustering usually means to group in accordance with the similarity between objects.The distance is the most common similarity metric, which reflects the similarity between objects by measuring the difference of the objects.In practical applications, the choice of distance depends on the characteristics of the object, and it is generally applied to cluster as similarity metric.In the used POL-SAR segmentation, according to the coherency matrix which is obeying the Wishart distribution, Anfinsen [20] and Ersahin [21] have chosen the Wishart distance with Gaussian kernel as the similarity metric, and applied it to spectral clustering.This kind of special distribution of the T-matrix limits the construction of similarity metric.And there are two problems calculating the similarity with Gaussian kernel between pixels: 1) the scale parameter of the Gaussian kernel σ needs to be manually set precisely in accordance with the experience, and the single scale parameter σ cannot capture category distribution information of multiple scales data well; 2) huge exponentiation consumption.
In our method, firstly, we choose the eigenvalues of coherency matrix as the input features, which include the essential information of coherency matrix and represent the intensity of scattering.Secondly, we apply Mahalanobis distance as the similarity metric by studying the statistical properties of eigenvalues.And taking into account the neighborhood information of image, consistency constraints will be applied to the similarity metric.Thirdly, the above similarity metric is applied to spectral clustering algorithm to complete the segmentation.At last, in order to improve and stabilize the segmentation results, the strategy of cluster ensemble is used.

Polarmetric Features Analysis
The full-polarimetric SAR data can be expressed by complex scattering matrix S : where, h and v represent horizontal and vertical polarization modes, respec- tively.It is commonly assumed that natural targets exhibit reciprocity, hv vh S S = .
The above scattering matrix also can then be expressed as the scattering vector k .
[ ] In order to better explain the physical meaning of the scattering process, the coherency matrix T is used: The coherency matrix is a Hermit matrix, T T * = , whose size is 3 × 3.
In order to make better use of the polarization scattering matrix to reveal the physical mechanism, polarization data usually is broken down into different components by decomposition [14].In [11], Cloude et al. have put forward Cloude decomposition which has very important significance in POL SAR data processing.The two parameters, scattering entropy and the scattering angle, obtained from decomposition are widely used in image segmentation [18].
Cloude decomposition: where i . Each eigenvector represents a scattering mechanism, and the corresponding eigenvalue represents the intensity of the scattering mechanisms.
The eigenvalues and eigenvectors resulting from Cloude decomposition have been concerned and studied [22] [23] [24] [25].Carlos Lopez-Martinez has made an in-depth analysis of the probability density function of the sample eigenvalues of the covariance or coherency matrix and proposed that Gaussian scattering assumption is valid to the sample eigenvalues distribution of homogeneous distributed scatters [22] [24].In [22], the Gaussian Mixture Modes have been used in inhomogeneous areas.Based on different areas obeying Gaussian distribution with different parameter, expectation maximization algorithm has been applied to the segmentation of POL-SAR data.
So we conclude that the eigenvalues of coherency matrix include rich polarization information.And Gaussian distribution of eigenvalues makes it more convenient to measure than Wishart distribution of coherency matrix.

Similarity Metric Based on Eigenvalue Analysis
As described in the reference [22], we analyze the distribution characteristics of eigenvalue.Figure 1(a) marked seven areas A-F, comes from a part of the POL-SAR data set Flevoland.50 sample points are randomly selected from every area, respectively.Then, we make three eigenvalues of every point as coordinate and show them as is seen from Figure 1(b).We can see that the eigenvalues of different class have divisibility substantially.Then, we apply Gaussian Mixture Modes to simulate eigenvalues of every area, as is seen from Figure 1(c).We can conclude that eigenvalues are approximate Gaussian distribution with different parameter.
Euclidean distance is the most widely used similarity metric, whose characteristics are as follows: 1) the ranges of different features (Table 1) are ignored.As

( ) ( )
where, , x y are features of two points.C is covariance matrix, varying with input data.Thus, the similarity metric with Mahalanobis distance is adaptive.

Similarity Metric with Spatial Consistency Constraint
In the process of segmentation and classification, the probability of pixels in the image and its neighborhood having the same class attributes is large, called spatial coherence constraints.So, we choose the similarity metric with spatial coherence constraints [27].
In summary, the final used affinity matrix is: where, ( ) ( ) where, C is covariance matrix, , i j x x are the features of the ith, jth image pixels.k N are the pixels whose center is j x and neighborhood window is k k × .R N is the number of pixels included in window.
As described in [28], this method of parameter α is not sensitive.When 3 α ≥ , the result of algorithm is remained.And, when the size of the neighbor- hood window 15 k ≤ , the result is improved gradually.When 15 k > , the result is bad gradually.Specific to each image, the k is decided by the texture.
With above similarity metric, grouping data is to complete clustering.How-Journal of Geographic Information System ever, some of the commonly used statistical-based clustering algorithm such as EM, demands obedience distribution, and is sensitive to the initialization.At the same time, when Gaussian fitting, the statistical properties of mixed-pixel is unstable.So EM algorithm is not suitable for the eigenvalues, just like the results displayed in Figure 2. As result, we choose the spectral clustering whose distribution of data is not a requirement.

Spectral Clustering Spatial Consistency
Spectral clustering is a typical clustering algorithm based similarity metric.Spectral clustering algorithm is no longer required a convex structure of the data to ensure a good result, and which is also a discriminant method.Instead of making assumptions of the global structure of data, spectral clustering algorithm firstly collects local information to indicate the possibilities of whether two points belonging to the same class, then makes the global decision based on a clustering criterion to divide all the data points into irrelevant sets.
Spectral clustering has good clustering results, but for larger POLSAR data, the application of classic spectral clustering algorithm has been limited [29].Many fast spectral clustering algorithms are proposed [30] [31] [32] [33] [34].In the integral characteristic function.The method first randomly select a small part of sample from all of the samples as representative points to solve the characteristic problem, and then extend eigenvector to the similarity matrix for the entire sample set.
The main steps of Nyström algorithm are as follows: Step 1. Randomly selected m sample points as sample subset; Step 2. Form the affinity matrix of the subset Step 3. Eigendecompose W, obtain the eigenvalues and the corresponding eigenvectors of W, then extrapolate eigenvectors of the entire similarity matrix; Step 4. Clustering first n-dimensional eigenvectors into n clustering via k-means, as the final segmentation results.
Where, W means similarity matrix between sample points to be cluster, and it contains all the information required to cluster.In our method, W is con- ducted as S S α + .

Cluster Ensemble
The clustering ensemble is a final division of multiple clustering results of given task, and the division has better robustness, novelty and stability.The key issue is how to obtain better clustering results based on combinations of different cluster results membership, also means the construction and choice of consensus function.
Consensus function gives multiple clustering results a final division.In [29], the researcher has introduced three consensus functions: cluster-based similarity partitioning algorithm (CSPA), hyperGraph partitioning algorithm (HGPA), meta-Clustering algorithm (MCLA).All of them approach the problem by first transforming the set of clustering into a hypergraph representation, and obtaining hypergraph minimum cut.Among of them, the complexity of MCLA is ( ) O nk r , varying linearly with the number of samples.MCLA has superiority in complexity and the integrated quality, so we choose MCLA as consensus function.
Nyström algorithm can effectively reduce the computational complexity.However, clustering results are instable as a result of the random sampling.So we make use of cluster ensembles to keep stable segmentation results.Apply Nyström algorithm for k times, and every time random sampling the same amount of sample to obtain clustering labels { } .And map the labels into the final result with MCLA.

Algorithm Steps of Eigenvalue Similarity Metric Based Spectral Clustering Method
Our algorithm process is divided into three steps: pretreatment, similarity metric, spectral clustering ensemble.Pretreatment: refined Lee filter with window 7

Feature Analysis
For clustering, the divisibility between two categories depends on the distance between them.So we make a comparison of Euclidean distance and Mahalanobis distance of each of the two categories in Figure 4  the distance between two categories, but still not enough to separate every point.

Segmentation Result
In order to better demonstrate the effectiveness of our method, we choose the contrast algorithms: 1) H/a/A_Wishart [19]; 2) spectral clustering_Wishart proposed by Anfinsen et al. in [20]; 3) our method with Euclidean distance.

a) OVER ALL Flevoland Data Set
For image Flevoland, in our method,the number of random sample is 70, the size of neighborhood window 3 k = because this image doesn't has fine texture but has small blocks, 3 α = , the number of classes is 7.The number of classes of the algorithm (spectral clustering_Wishart) is also 7, and the algorithm (H/a/A_Wishart) is fixed 16 then mergered to be 7 manually.
The ground truth does not provide a label for each pixel of the entire image so the accuracy calculation is limited to only those pixels where the ground-truth provides a label.Partial ground-truth map is shown in Figure 5(a).The overall Journal of Geographic Information System segmentation accuracy of four methods is shown in Table 2, and segmentation maps are shown in Figures 5(b)-(e).Overall segmentation accuracy P is de- fined ( ) where Correct is the number of pixels emergimg both in the ground truth and classifying result for each category terrain object, N is the total number of pixels, and K is the category number of terrain, i, j is the pixel from a kind of category.
From Table 2 and Figure 5 it is seen that the performance of our method with Mahalanobis distance is much better than Euclidean distance.For the edge of region, our method is worse than others as a result of application of spatial information, and other methods make use of Wishart iteration or segmentation to keep clear edge.For the region consistence, our method is better as a result of  In order to improve the credibility of the contrast algorithm, and contrast with reference [21], we choose the same image (Figure 6(a)) which is retrieved from Figure 3(a).And GroundTruth also come from reference [20].As described in [20], the H/a/A_Wishart (0.68) and spectral clustering_Wishart (0.67) have the similar results, and its algorithm (0.75) outperform the H/a/A_Wishart by 7.1%.Though our method (0.745) outperform the H/a/A_Wishart by 6.5%, our method has less complex process and fewer parameter.c) San Francisco Data Set In order to demonstrate the robustness of our method, the San Francisco bay data is tested by four algorithms.The number of random sample is 70, 3 k = , 1 α = , the number of classes is 3.The contrast algorithms results are mergered to be 3 manually.As can be seen from the two marked details, our method is superior to other methods in the shape.At the same time, ocean, city and forest are classified clearly, and the forest at the upper left corner has good regional consistency.The overall segmentation accuracy of four methods is shown in Table 3, and segmentation maps are shown in Figures 7(a    Francisco bay and Xi'an city is small by random sampling, so the process of cluster ensemble can be bypassed.And the contrast algorithms need the process of Wishart iteration, which consumes time.In the term of validity, our method can guarantee the accuracy of the overall segmentation, at the same time keep the details on good results.
At the same time, we can see the method has good robustness from the above three images.This is because the eigenvalue expresses the main information of T matrix.

Conclusion
This paper introduces an approach for segmentation of the POLSAR data based on eigenvalue similarity metric.From the scientific and application point of view, it is a new approach of data processing.With analysis of the characteristic eigenvalue, we propose a new construction method of similarity metric.As a result, our method reduces the complexity of the spectral clustering for POL-SAR image segmentation, avoiding the choice of Gaussian kernel parameter and completing clustering effectively.From the experimental results, we can see that

Figure 1 .
Figure 1.Distribution of different areas eigenvalues.

1 )
Figure 3(a), the image is shown with color composed by Pauli matrix representation: red for HH VV − , green for HV VH + and blue for HH VV + .The map shows an agricultural area, covered with different crops and water.2) San Francisco Data Set：It is the fully polarimetric L-band airborne SAR data acquired with the AIRSAR sensor of the NASA/JPL at the test site of San Francisco bay, which has a mixed scene of urban, vegetation and ocean.The original data has the size of 1024 × 900 pixels, and experimental data is the size of 800 × 500.In Figure 3(b), the image is shown with color composed by Pauli matrix representation: red for HH VV − , green for HV VH + and blue for HH VV + .

Figure 4 .
Figure 4. A-G class were randomly taken 50 points, respectively calculate the distance between each class and all classes (including itself).Each class has its own color, which is consistent with the color of GroundTruth.(a) Euclidean distance.(b) Mahalanobis distance.(c) Pauli RGB composite and the areas marking.(d) Groundtruth.(e) The result of spectral clustering with Euclidean distance.(f) The result of spectral clustering with Mahalanobis distance.

Table 1 .
Comparison of range of three eigenvalues., the Euclidean distance between two points depend on the feature 3 λ largely.2) Without considering the correlation between features, Euclidean distance treats features equally and only integrates the difference of each feature between two points.3)Euclideandistance is applicable to the data obeying strictly Gaussian distribution.However, our data is not strictly Gaussian distribution, just like the area A shown in Figure1(c).
[26]ltdistribution of features in the entire space, therefore, to better describe the similarity between two points.3)Mahalanobisdistance is applicable to the data obeying Gaussian distribution approximately[26].

Table 2 .
The overall segmentation accuracy comparison of Flevoland area for method.
b) PARTIAL Flevoland Data Set

Table 3 .
The segmentation accuracy comparison of marked area of San Francisco for methods.