Integrated Use of Existing Global Land Cover Datasets for Producing a New Global Land Cover Dataset with a Higher Accuracy : A Case Study in Eurasia

It has been commonly acknowledged that the current global mapping projects have encountered the accuracy challenge. By conducting a comparison among the four existing global land cover datasets (MODIS LC, GLC2000, GLCNMO and GLOBCOVER), it has been identified that certain areas’ accuracy has dragged down the overall accuracy of these global land cover datasets. In this paper, those areas have been defined as the “unreliable area”. This study has recollected the training data from the “unreliable area” within the above four mentioned datasets and reclassified the “unreliable area” by using two supervised classifications. The final result has shown that compared with any existing datasets, a relatively higher accuracy has been able to achieve.


Introduction
Global mapping plays an important role in the areas such as monitoring the major environmental phenomena, environmental protection as well as sustainable growth.An accurate global map could also contribute to the establishment of a global spatial data infrastructure, for future research and many other scientific purposes.
Until present, many global land cover projects have been carried out.Examples are that the IGBP DISCover dataset was based on the Advanced Very High Resolution Radiometer (AVHRR) from 1992 to 1993 [1], and the land cover product of the University of Maryland (UMD) was based on the same data from AVHRR, distinguished 14 classes [2].In 2002, Boston University produced the MODIS land cover data using MODIS 1-km satellite data on board the Terra satellite [3].The Global Land Cover 2000 (GLC2000) was based on SPOT-VEGETATION data from November 1999 to December 2000 [4,5].Global Land Cover by National Mapping Organizations (GLCNMO) was based on 2003 data from MODIS, which was produced by Center for Environmental Remote Sensing (CEReS, Chiba University) [6].In 2009, cooperating with an international net-work of partners (including EEA, FAO, GOFC-GOLD, IGB, JRC and UNEP), the European Space Agency (ESA) produced GLOBCOVER.Unlike other datasets, GLOBCOVER presents a higher resolution (300 m) than any previous global satellite derived maps [7].
Besides many studies on a single datasets, various researches have also tried to compare the exiting different global land cover datasets.In 2006, a spatial comparison of four satellite derived 1 km global land cover datasets (IGBP, UMD, MODIS LC, GLC2000) was conducted by generalizing a global land cover legend [8].Another comparison between the exiting 1 km datasets was conducted in 2008 [9].Purpose of those comparisons is trying to develop the integrated use of different datasets.For example, areas having the high agreement from the various existing global datasets were to be served as the reference data for training area selections by Chandra Giri et al.'s study in 2005 [10].
However, the integrated uses so far have mostly focused on the areas with high accuracy.There are large areas with low accuracy, which seem to have been ignored.If the accuracy of these areas could be improved to a higher level, theoretically a better global land cover datasets can be expected and the potential usage can be discovered within those accuracy-improved areas.There-fore, a question of "How to improve the accuracy level of certain areas" has been raised, which is also the key objective of this paper.
This study used these datasets to separate the high accurracy area and the low accuracy area.Next, for the reclassification purpose, the low accuracy area has been checked cautiously to collect the training data.Two classification methods (Maximum likelihood method and decision tree method) have been adopted to produce the accuracy result as well as to compare.Finally, the accuracy comparison has been done between the results and the existing datasets.

Preprocessing
As mentioned above, there is a resolution difference between MODIS LC (v004), GLC2000 (v1.1),GLCNMO (2003) and GLOBCOVER (2009).Therefore, to be able to compare, the first step was to resample them all to the same resolution, which was a 300 m resolution same as GLOBCOVER (2009).
Next step was to reconcile the different legends (Table 1), again due to the differences among those four datasets.Most classes (i.e.some part of the forest, urban, bare land and water bodies etc.) were translated well.However, the "mixed classes" were difficult to correspond with each other.In this study, the correspondences were mainly based on the GLCNMO's classes [11][12][13][14][15][16][17].
Table 1 shows the pixel-by-pixel comparison of four maps.

Area Separation Based on the Accuracy Assessment
The information provided by four global land cover datasets could lead to four levels of synthesized agreements, which are listed as below: Zone 1: No agreement in all datasets.Zone 2: The first two datasets are in agreement and the other two are also in agreement.
Another situation is only two of the four datasets are in agreement while the other two are not.
Zone 3: Agreement among three datasets.Zone 4: Agreement among all the four datasets.According to the above information, the regions of Zone 3 and Zone 4 are defined as the "reliable area" (Figure 2) in this study.Consequently, the regions of Zone 1and Zone 2 (as the blank part of Figure 2) are defined  as the "unreliable area".Regarding those so called the "reliable area", are they truly reliable (with highly accuracy)?To confirm those zones have been defined correctly, an accuracy assessment was conducted.The classes with the majority agreements were adopted directly in zone 3 and zone 4. On the other hand, it was difficult to decide the certain classes based agreements in zone 1 and zone 2. Therefore, the blank parts were filled with the GLCNMO's classes.
A total number of about 1800 validation points were taken randomly to cover all classes except the classes of Snow/Ice and Water Bodies.The land cover types of all validation points were identified by the following information: 1) Satellite image of Google Earth.
2) Ground photographs near the locations in Google Earth.
Out of which, about 800 validation points were successfully identified as shown in Figure 3.
The final validation result is shown in the Table 2 below.
The final result has shown an average accuracy of approximately 76%, which is generally same as the overall accuracy of the existing global land cover datasets.Similar tests were conducted as to compare, i.e. filled the blank parts with other global land cover datasets (MODIS LC, GLC2000, GLOBCOVER) and the similar results were achieved.
As the validation result, simply by overlaying the existing global land cover dataset, the overall accuracy cannot be improved.At the same time, it has also revealed the "unreliable area" has dragged down the overall accu-racy of these global land cover datasets.Many factors could lead to the appearance of the "unreliable area", and examples are the complexity of the geographic systems, the different resolutions and resources of satellite data, and the different definitions of classes etc.
Another critical factor that leads to the appearance of the "unreliable area" is the different classification methods that being adopted in different land cover datasets.Among all methods, supervised classification is mostly commonly adopted.During the supervised classification processing, the quality of training data plays an essential role.Therefore one assumption has been proposed, which is the lack of quality training data that caused the "unreliable areas" (zone 1 and zone 2).To verify such assumption, the training data of GLCNMO was doublechecked.The result has shown that most training data in GLCNMO was generated from the "reliable area", thus the assumption has been verified.

Recollection of Training Data and Reclassification
To be able to reclassify, this paper has used MODIS 2008 16-day composite imagery (http://glcf.umd.edu/data/modis/).Center for Environ-  Figure 4 shows the case study area in this paper.Eleven land cover classes indicated in Table 3 were classified by the supervised classification.On the contrast, the other six land cover classes were difficult to be determined by the supervised method according to the GLCNMO experiences.
The training data were colleted from the "unreliable area", which was checked cautiously to ensure the quality.First of all, for proper program processing, the number of pixels should not be less than 72 in each sub-class.If there was no sufficient training data at the "unreliable area", the training data from the "reliable area" that has the same characteristics was adopted.All the training data (every pixel) in this study were added, deleted or modified, according to the MODIS 2008's NDVI seasonal (23 periods) patterns.
As the end result, eleven land cover classes have been divided into 81 sub-classes (Table 4).

Reclassification Result
Maximum likelihood method (MLC) by ENVI software was adopted, similar to the previous GLCNMO project.Decision tree method (DCT) by See5 software and CART software was used as well.
The classification result is shown in Figure 5 and

Accuracy Comparison
In order to validate the results, another total number of 800 validation points was further taken randomly for the  and 6) and the four existing datasets.The average accuracy of the existing global land cover datasets is approximately 56%.Relatively, the average accuracy of result 1 is 70.13% and result 2 is 65.93%.
Using the existing land local data products does make the training data preparation more efficient.While such methods tend to extract the training data mostly from the "reliable area", this study has proved that the training data colleted from "unreliable area" are very important as well.This paper shows that the accuracy of "unreliable area" can be 11 classes.As the same identify method mentioned abo-

Figure 4 .
Figure 4. Study area (Eurasia).able 3. Land cover classes that were classified by super-

Figure 7 . The accuracy comparison between t w Map 1 , 1 ) 16 )
Figure 7.The accuracy comparison between t w Map 1,