Advanced Classification of Lands at TM and Envisat Images of Mongolia

The aim of this study is to fuse high resolution optical and microwave images and classify urban land cover types using a refined Mahalanobis distance classifier. For the data fusion, multiplicative method, Brovey transform, intensity-huesaturation method and principal component analysis are used and the results are compared. The refined method uses spatial thresholds defined from local knowledge and the bands defined from multiple sources. The result of the refined Mahalanobis distance method is compared with the result of a standard technique and it demonstrates a higher accuracy. Overall, the research indicates that the combined use of optical and microwave images can notably improve the interpretation and classification of land cover types and the refined Mahalanobis classification is a powerful tool to increase classification accuracy.


Introduction
Over the years, the image data fusion has become a very valuable approach for the integration of multisource satellite data sets.It is well known that optical data contains information on the reflective and emissive characteristics of the Earth surface features, while the synthetic aperture radar (SAR) data contains information on the surface roughness, texture and dielectric properties of natural and man-made objects.In the past years, the integrated features of these multisource data sets have been efficiently used for an improved land cover analysis.It has been found that the images acquired at optical and microwave ranges of electro-magnetic spectrum provide unique information when they are integrated.Many authors have proposed and applied different techniques to combine passive sensor and microwave images in order to enhance various features and they all judged that the results from the fused images were better than the results obtained from the individual images [1][2][3][4][5][6][7].
In general, high resolution optical RS data sets taken from different Earth observation satellites such as Landsat and SPOT have been successfully used for a landcover mapping since the operation of the first Landsat launched in 1972, whereas high resolution SAR images taken fromspace platforms have been widely used for different thematic applications since the launch of the ERS-1/2, JERS-1 and RADARSAT satellites [8].As the very high resolution TerraSAR data has become available since 2006, radar images could be efficiently used for a very accurate mapping and analysis.It is clear that the combined application of optical and SAR data setscan provide unique information for different thematic studies, because passive sensor images will represent spectral variations of various surface features, whereas microwave data with their penetrating capabilities can provide some additional information.For example, in urban environment the optical images provide the information about the spectral variations of the different urban features, whereas the radar images provide structural information about buildings and street alignment owing to the double bounce scattering [9].
Traditionally, multispectral RS data sets have been widely used for a land cover mapping and for the generation of land cover information, different supervised and unsupervised classification methods have been applied.However, the emergence of microwave images has given new opportunities for the users and researchers dealing with processing and analysis of remotely sensed data sets.Unlike single-source data, data sets from multiple sources have proved to offer better potential for discriminating between different land cover types.Many authors have assessed the potential of multisource images for the classification of different land cover classes [10][11][12][13][14][15].In RS applications, the most widely used multisource classification techniques are statistical methods, Dempster-Shafer theory of evidence, neural networks, decision tree classifier, and knowledge-based methods [10,16,17].
In recent years, mapping of urban areas, specifically at regional and global scales has become an important task due to the increasing pressures from rapid urbanization and associated environmental and social problems [18].However, in most cases urban areas are complex and diverse in nature and many features have similar spectral characteristics and it is not easy to separate them by the use of common feature combinations or by applying ordinary techniques.In order to successfully extract urban land cover classes, reliable features derived from multiple sources and an efficient classification technique should be used.The aim of this study is a) to investigate different data fusion techniques for the enhancement of spectral variations of urban features, later to be used for training sample selection, and b) to classify the features composed by the fusion techniques using a refined Mahalanobis distance classifier.Thus, a fusion of high resolution optical and microwave images will help in defining the sites with the most appropriate training samples and refined Mahalanobis distance classifier will be used for deriving an improved land cover map.For the final analysis, multisource data sets of the urban area in Mongolia have been used.The analysis was carried out using PC-based ERDAS Imagine 10.1 and ENVI 4.8.

Test Site and Data Sources
As a test site, Ulaanbaatar, the capital city of Mongolia has been selected.Ulaanbaatar is situated in the central part of Mongolia, on the Tuul River, at an average height of 1350 m above sea level and currently has about 1.28 million inhabitants.Although, the city is extended from the west to the east about 30 km, and from the north to the south about 20 km, the study area chosen for the present study covers an area of 11.3 km long and 8.7 km wide).In the selected image frame, it is possible to define such classes as built-up area, ger area (Mongolian traditional dwelling), forest, grass, soil and water.The built-up area includes buildings of different sizes, while ger area includes mainly gers surrounded by fences.Figure 1 shows a Landsat image of the test site, and some examples of its land cover.In the present study, for the urban land cover studies, a Landsat TM image of 31 July 2010 and an Envisat C-band image of 25 May 2010 have been used.The Landsat ETM+ data has seven multispectral bands (B1: 0.45 -0.52 μm, B2: 0.52 -0.60 μm, B3: 0.63 -0.69 μm, B4: 0.76 -0.90 μm, B5: 1.55 -1.75 μm, B6: 10.40 -12.50 μm and B7: 2.08 -2.35 μm).The spatial resolution is 30 m for the reflective bands, while it is 120 m for the thermal band.In the current study, channels 2, 3, 4, 5, 7 have been used.The Envisat is a European Earth-observing satellite carrying a cloud-piercing, all weather free polarimetric radar which is designed to monitor the Earth from a distance of about 790km.The characteristics of the Envisat data used in the current study are shown in Table 1.

Co-Registration of Optical and SAR Images
In order to perform accurate data fusion, good geometric correlation between the images is needed.As a first step, the Landsat image was georeferenced to a Gauss-Kruger map projection using 12 ground control points (GCPs) defined from a topographic map of the study area.The GCPs have been selected on clearly delineated crossings of roads, streets and city building corners.For the transformation, a second-order transformation and nearestneighbour resampling approach were applied and the related root mean square (RMS) error was 0.83 pixel.Then, the Envisat image was geometrically corrected and its coordinates were transformed to the coordinates of the georeferenced Landsat image.In order to correct the SAR image, 18 more regularly distributed GCPs were selected from different parts of the image.For the actual transformation, a second-order transformation was used.As a resampling technique, the nearest-neighbour resam- pling approach was applied and the related RMS error was 1.16 pixel.

Speckle Suppression of the Envisat Image
As microwave images have a granular appearance due to the speckle formed as a result of the coherent radiation used for radar systems; the reduction of the speckle is a very important step in further analysis.The analysis of the radar images must be based on the techniques that remove the speckle effects while considering the intrinsic texture of the image frame [1,19].In this study, four different speckle suppression techniques such as local region, lee-sigma, frost and gamma map filters [11] of 3 × 3 and 5 × 5 sizes were compared in terms of delineation of urban features and texture information.After visual inspection of each image, it was found that the 3 × 3 gamma map filter created the best image in terms of delineation of different features as well as preserving content of texture information.In the output image, speckle noise was reduced with very low degradation of the textural information.

Image Fusion
The image fusion is a technique used to combine images of different spatial and spectral resolutions.Very often a high resolution panchromatic image is integrated with a low resolution multispectral image thus improving interpretation and analysis of the natural and man-made objects.In other words, the image fusion is the integration of different digital images in order to create a new image and obtain more information than can be separately derived from any of them [9,20,21].
In the present study, for the urban areas, the SAR image provides structural information about buildings and street alignment due to the double bounce effect, while the optical image provides the information about the spectral variations of different urban features.Image fusion can be performed at pixel, feature and decision levels [22].In this study, data fusion has been performed at a pixel level and the following techniques were applied: (a) multiplicative method, (b) Brovey transform, (c) intensity-hue-saturation (IHS) method, (d) principal component analysis (PCA).Of these methods, the first two are considered as the ordinary methods, while the last two are regarded as the complex techniques.
The fused images enhance the natural and man-made objects in different ways.Therefore, it is not necessary that the performance of the complex techniques is better than the ordinary methods.In most cases, the judgements are made on the basis of an interpretation.Each of the selected techniques is briefly discussed below.
Multiplicative method: This is the most simple image fusion technique.It takes two digital images, for example, high resolution panchromatic and low resolution multispectral data, and multiplies them pixel by pixel to get a new image [23].It can be formulated as follows: In the present study, the Envisat image was considered as a low resolution band, while bands 3, 4 and 5 of Landsat were considered as high resolution bands.
Brovey transform: This is a numerical method used to merge different digital data sets.The algorithm based on a Brovey transform uses a formula that normalises multispectral bands used for a red, green, blue colour display and multiplies the result by high resolution data to add the intensity or brightness component of the image [24].The formulae used for the Brovey transform can be described as follows: For the Brovey transform, bands 3, 4 and 5 of Landsat have been used and the SAR band was considered as a high resolution band.
IHS method: The IHS method is the most widely used data fusion technique.The intensity is the overall brightness of the scene and it varies from 0 (black) to 1 (white).The hue is representative of the color or dominant wavelength of the pixel and varies from 0 at the red midpoint through green and blue back to the red midpoint at 360. linearly from 0 to 1 [25].This method assumes that the H and S components contain the spectral information, while the I component represents the spatial information.The formulae used for the RGB to IHS transform can be described as follows: ation, the R ba on understanding of the PCA is th te a principal components transformation, a lin where: For the IHS transform GB image created by nds 3 and 4 of Landsat as well as the Envisat band, have been used.Here, the SAR band was considered as the I.When the IHS image was transformed back to the RGB colour space, contrast stretching has been performed to the I channel.

PCA:
The most comm at it is a data compression technique used to reduce the dimensionality of the multidimensional datasets or bands.The bands of the PCA data are non-correlated and are often more interpretable than the source data.In n dimensions, there are n principal components.Each successive principal component is the widest transect of the ellipse that is orthogonal to the previous components in the n-dimensional space, and accounts for a decreasing amount of the variation in the data which is not already accounted for by previous principal components.Although there are n output bands in a PCA, the first few bands account for a high proportion of the variance in the data.Sometimes, useful information can be gathered from the principal component bands with the least variances and these bands can show subtle details in the image that were obscured by higher contrast in the original image [25].
To compu ear transformation is performed on the data meaning that the coordinates of each pixel in spectral space are recomputed using a linear equation.The result of the transformation is that the axes in n-dimensional spectral space are shifted and rotated to be relative to the axes of the ellipse.To perform the linear transformation, the eigenvectors and eigenvalues of the n principal components must be derived from the covariance matrix, as shown below: al elements are zeros and D is computed so that its non-zero elements are ordered from greatest to least, so that 1 In the present study, the PCA has been performed using all available bands and the results are shown in Table 2.
As can be seen from Table 2, PC1 is totally dominated by the variance of HH polarisation of Envisat and other bands have almost no influence on it.Although, it contained 84.69% of the overall variance, a visual inspection revealed that it contained less information related to the selected classes.The first middle infrared band of Landsat has a high negative loading in PC2.Here, the second middle infrared band of Landsat also has the second highest negative loading.In PC3, near infrared band has a high negative loading and red band has moderately high loading.Although PC3 contained only 2.07% of the overall variance, visual inspection showed that it contained some useful information related to the urban texture.PC4 is dominated by the variances of the red and near infrared bands.However, as it contained only 0.95% of the overall variance, visual inspection revealed that it had not much significance.The inspection of the PC5 and PC6 indicated that they mainly contained noise from the total data set.
In order to obtain a good colour image that can illustrate spectral and spatial variations of the classes, different band combinations have been used.As we wanted to define the best image used for selection of the appropriate training sites, the judgements were made through a visual interpretation.The preliminary visual inspection showed that the image created by the multiplicative method gave the worst result compared to all other results.On this image, spatial and spectral separations among various classes were not easy, because the image The IHS transformed image gave a superior result in terms of the spectral separation between different objects and classes, because it could easily separate almost all available classes.In addition, one could observe that the built-up areas were texturally separable from the ger areas.The image created by the PCA method was good, at least it was better than the results obtained by the multiplicative method and Brovey transform.However, on this image, it was difficult to see the fuzzy boundaries between two urban classes, namely built-up area and ger area.Figure 2 shows the comparison of the images obtained by the used fusion methods.As seen for the figure 2, the performance of the IHS method was better than all other results, because on this image we can clearly see the separation of the available classes.Therefore, this result was used for further analysis.

Standard Mahalanobis Distance Classification
Initially, to define the sites for the training on from the comb several areas of interest (AOI) representing the available six classes (built-up area, ger area, forest, grass, soil and water) have been selected through accurate analysis of the fused images.As the data sources included both optical and SAR features, the fused images were very useful for the determination of the homogeneous AOI.Especially, the image obtained by the IHS method was enormously helpful.
The separability of the training signatures was firstly checked in feature space and then evaluated using Jeffries-Matusita distance (Table 3).The values of Jeffries-Matusita distance range from 0 to 2.0 and indicate how well the selected pairs are statistically separate.The values greater than 1.9 indicate that the pairs have good separability [25,26].After the investigation, the samples that demonstrated the greatest separability were chosen to form the final signatures.
The final signatures included about 80 -827 pixels.For the classification, the following feature combinations were used: 1) All original spectral bands of the Landsat TM image; 2) Red, near infrared and first middle infrared (i.e., 3, 4 and 5) bands of the Landsat TM image; 3) The PC1, PC2, PC3 and PC4 of the PCA (the first four PCs contained 98.87% of the overall variance); 4) The Envisat and original spectral bands of the Landsat TM data.
For the actual classification, a Mahalanobis distance classifier has been used.The Mahalanobis distance classifier is a parametric method, in which the criterion to determine the class membership of a pixel is the minimum Mahalanobis distance between the pixel and the class centre.The Mahalanobis distance   k MD is ex- pressed as follows: where x i is the vec tor representing the pixel, m k is the sample mean vector for class k, and V k is the sample variance-covariance matrix of the given class.The sample mean vectors and variance-covariance matrices for each class are estimated from the selected training signatures.Then, every pixel in the dataset is evaluated using the minimum Mahalanobis distance and the class label of the closest centroid is assigned to the pixel [27].seen om Fi ures 3 a)-(d) he cl sifica n resu all b nds of Lands TM gives th wors sult, because there are high overlaps among two urban classes: built-up area and ger area.However, these overlaps decrease on the classified image of red and infrared bands.It can be explained by a fact that a fewer bands with statistically separable features can produce a better result than many bands with high overlaps.The PC bands as well as the combined use of optical and microwave data sets produced better results than the results of the Landsat TM bands, but they still contain many mixed areas of different classes.As could be seen, although multisource images give some improvement, is still very difficult to obtain a reliable land cover map by the use of the standard technique, specifically on decision boundaries of the statistically overlapping classes.
For the accuracy assessment of the classification results, the overall performance has been used.This approach creates a confusion matrix in wh xels are compared with the classified pixels and as a result an accuracy report is generated indicating the percentages of the overall accuracy [26].As ground truth information, different AOIs containing 1239 purest pixels have been selected.AOIs were selected on a principle that more pixels to be selected for the evaluation of the larger classes such as ger area and soil than the smaller classes such as water.The overall classification accuracies for the selected classes are shown in Table 4.

The Refined Classification Method
For many years, single-source multispectral dat  first single polarization mirowave data sets, multisource images have proved to een efficiently r mappin ce he appearance of the c offer better potential for discriminating between different land cover types.In general, it is very important to design an appropriate image processing procedure in order to successfully classify any digital data into a number of class labels.The effective use of different features derived from different sources and the selection of a reliable classification technique can be a key significance for the improvement of classification accuracy [28].In the present study, for the classification of urban land cover types, a refined Mahalanobis distance algorithm has been constructed.As the features, bands 3, 4, 5 and 7 of Landsat TM and Envisat HH polarization images have been used.The green band (i.e., band 2) of Landsat was excluded, because it had a high correlation with the red band (i.e., band 3).
Unlike the traditional Mahalanobis classification, the constructed classification algorithm uses spatial thresholds defined from the local knowledge.The local knowledge was defined on the basis of the spectral variations of the land surface features on the fused images.It is clear that a spectral classifier will be ineffective if applied to the statistically overlapping classes such as builtup area and ger area because they have very similar spectral characteristics.For such spectrally mixed classes, classification accuracies should be improved if the spatial properties of the classes of objects could be incorporated into the classification decision-making.The idea of the spatial threshold is that it uses a polygon boundary to separate the overlapping classes and only the pixels falling within the threshold boundary are used for the classification.In that case, the likelihood of the pixels to be correctly classified will significantly increase, because the pixels belonging to the class that overlaps with the class to be classified using the threshold boundary are temporarily excluded from the decision making process.In such a way, the image can be classified several times using different threshold boundaries and the temporary results will be stored as ancillary classification results.After applying the last threshold, the ancillary results can be merged, thus producing a final improved land cover map [29].
In the present study, the spatial thresholds have been applied for differentiation of the spectrally similar classes such as built-up area and ger area as well as forest and grass.The result of the classification using the refined m elected data fusion techniques for f different urban features and classify ypes using a refined Mahalanobis dis-ethod is shown in Figure 4.For the accuracy assessment of the classification result, the overall performance has been used, taking the same number of sample points (i.e., 1239 purest pixels) as in the previous standard classifications.The confusion matrix produced for the refined classification method showed overall accuracy of 91.58%.As could be seen from Figure 4, the result of the classification using the refined Mahalanobis classifier is much better than results of the standard method.As the overall accuracy exceeds 90%, this kind of result can be directly used for a spatial decision-making or update a thematic layer within a spatial information system.A general diagram of the refined Mahalanobis classification is shown is Figure 5.

Conclusion
The main purpose of the research was to compare the performances of the s the enhancement o urban land cover t tance classification.For the data fusion, multiplicative method, Brovey transform, IHS method and PCA were used.When the results of the fusion techniques were compared, the IHS transformed image gave a superior image in terms of the spectral and spatial separations among different urban classes.To extract the reliable urban land cover information from the selected optical and microwave data sets, a refined Mahalanobis classification algorithm that uses spatial thresholds defined from the local knowledge was constructed.Overall, the study demonstrated that multisource information can consid- erably improve the interpretation and classification of land cover types and the refined Mahalanobis distance classifier is a powerful tool to produce a reliable land cover map.


The saturation represents the purity of color and varies (2c)

Table 2 . Principal component coefficients from Landsat TM and Envisat images.
The rovey trans rmed age l ked tter t n th imag obtained by the multiplicative method.On this image, green areas were totally separable from other classes, but it also somehow reflected the characteristics of the Envisat image.