Production of Multi-Features Driven Nationwide Vegetation Physiognomic Map and Comparison to MODIS Land Cover Type Product

Irrespective of several attempts to land use/cover mapping at local, regional, or global scales, mapping of vegetation physiognomic types is limited and challenging. The main objective of the research is to produce an accurate nationwide vegetation physiognomic map by using automated machine learning approach with the support of reference data. A time-series of the multi-spectral and multi-indices data derived from Moderate Resolution Imaging Spectroradiometer (MODIS) were exploited along with the land-surface slope data. Reliable reference data of the vegetation physiognomic types were prepared by refining the existing vegetation survey data available in the country. The Random Forests based mapping framework adopted in the research showed high performance (Overall accuracy = 0.82, Kappa coefficient = 0.79) using 148 optimum number of features out of 231 featured used. A nationwide vegetation physiognomic map of year 2013 was produced in the research. The resulted map was compared to the existing MODIS Land Cover Type (MCD12Q1) product of year 2013. A huge difference was found between two maps. Validation with the reference data showed that the MCD12Q1 product did not work satisfactorily in Japan. The outcome of the research highlights the possibility of improving the accuracy of the MCD12Q1 product with special focus on reference data. How to cite this paper: Sharma, R.C., Hara, K., Hirayama, H., Harada, I., Hasegawa, D., Tomita, M., Park, J.G., Asanuma, I., Short, K.M., Hara, M., Hirabuki, Y., Fujihara, M. and Tateishi, R. (2017) Production of Multi-Features Driven Nationwide Vegetation Physiognomic Map and Comparison to MODIS Land Cover Type Product. Advances in Remote Sensing, 6, 54-65. https://doi.org/10.4236/ars.2017.61004 Received: November 24, 2016 Accepted: February 3, 2017 Published: February 6, 2017 Copyright © 2017 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access


Introduction
Vegetation regulates biogeochemical cycles, energy balance and climate, ameliorates soils, and serves oxygen, energy and habitat to animals.Physiognomy (structural characteristics-tree, shrub, herbaceous; or leaf characteristics-evergreen or deciduous, needle-leaved or broad-leaved [1]) based vegetation classification is relevant to the characterization and monitoring of vegetation dynamics.
In spite of numerous land use/cover mappings at local, regional or global scale, vegetation physiognomic mapping is limited.Moderate Resolution Imaging Spectroradiometer (MODIS) based Land Cover Type product (MCD12Q1; [2]) is one of the most recently available global land cover product from which vegetation physiognomic information can be obtained.The MCD12Q1 product classifies land use/cover types using an ensemble based supervised classification algorithm (decision trees) complemented by the training data from 1860 sites distributed across the Earth's land areas [2].However, to the best of our knowledge, accuracy and applicability of the MCD12Q1 product in terms of physiognomy based vegetation types have not been assessed so far in Japan.
Studies in composition of forests and vegetation in Japan were initiated in the early Meiji Era , and the first vegetation map was prepared in 1900 based on field survey of dominant species [3].Kira et al. [4] [5] developed the Warmth Index (WI) defined as the annual sum of monthly mean temperatures above 5˚C, and described the forest zones of Japan correlating with temperature.
Based on the WI values, Japan is divided into five climatic zones: warm-temperate, semi-temperate, cool-temperate, subalpine, and alpine [6].The vegetation maps were further elaborated by stressing physiognomy, species composition, plant sociology, and succession by many researchers [7] [8] [9].Ohsawa [10] [11] [12] analyzed two-dimensional spatial distribution of Japanese vegetation utilizing latitude and altitude data.Brief description of the natural and semi-natural vegetation of Japan is provided by Numata et al. [3].Following the deformation or destruction of the original or natural vegetation during rapid industrialization and urbanization by human interferences, wise land use and re-vegetation planning were began, and national parks and nature reserves were established by realizing the need to conserve remaining fragments of climax forests [13] [14].Himiyama et al. [15] produced the original Land Use Information System (LUIS) maps with 24 land use/cover classes.Himiyama [16] analyzed land use/cover changes in Japan over the last hundred years, and projected that paddy field is the largest type of land use that suffers from urban expansion by 2020s in central Japan.Hara et al. [17] developed a model, called the Landscape Transformation Sere, for interpreting changes in land use/cover patterns and vegetation as caused by increasing levels of human activities through a series of stages.Harada et al. [18] digitized the LUIS maps available for 1900, 1950, and 1985; and analyzed the land use/cover changes between 1900 and 1985.Further land use/cover mapping of year 2001 using MODIS data showed that overall forest cover increased slightly from 72.1% in 1900 to 76.9% in 2001; however, in many areas, the climax vegetation was replaced by timber plantations.In their study, labeling and mapping of the vegetation types were conducted by using clusters obtained from the Iterative Self-Organizing Data Analysis Technique based unsupervised classification method.Nevertheless, labelling the resulted clusters into vegetation physiognomic types through visual interpretation of the very-high-resolution images and/or expert knowledge is a difficult and time-consuming task.
The availability of time-series of surface reflectance data from the MODIS onboard the Terra and Aqua satellites provides a unique opportunity for monitoring phenology of vegetation, and thereof mapping of vegetation physiognomic types.This study utilized the MODIS data of year 2013 for the production of vegetation physiognomic map in Japan.The objective of the research was to produce an accurate vegetation physiognomic map using an automated machine learning approach with the support of reference data.The resulted vegetation physiognomic map was compared to the MCD12Q1 product over Japan.The accuracy between newly produced map and MCD12Q1 product is discussed.

Study Area
This research covers the whole national land areas in Japan.Vegetation is an integral part of the Japanese landscape; more than two-thirds of the national land is covered by forests.Japan has high species diversity; approximately 7000 floral species have been recorded, and around 2900 floral species are endemic to Japan.The flora of Japan is also characterized by a richness of endemic families and genera.The Shino-Japanese region, which covers almost all of the Japanese archipelago, constitutes 15 endemic families and more than 300 endemic genera while none of the floristic regions in the world constitutes more than five endemic families [19] [20].
The climate of Japan is mostly temperate; arctic and subtropical climates are seasonally found in northern and southwestern Japan respectively.Japan is under the influence of monsoon climate; monsoon in summer brings a large amount of rain in the southeast side, whereas the monsoon in winter brings a large amount of snowfall in the northwest side and Hokkaido.The annual mean temperature is from 0˚C (central Hokkaido) to 18˚C (southern Kyushu), and annual precipitation ranges from 600 mm to 4000 mm.Topographically, 77% land falls between 0 and 700 m elevation; whereas 5% is highlands over 1300 m [3].A great variety of vegetation has flourished with diverse ranges of climates, temperatures and precipitation; and wider topographic variation.
Vegetation in Japan has been subject to severe disturbance due to rapid industrialization and urbanization over the past couple of centuries.Irrigated rice farming began about 3000 years ago, and since then many old growth forests especially in the warm-temperate and mid-temperate zones have been converted into secondary woodlands and forestry plantations [21] [22].Japanese vegetation are also prone to damages from earthquakes, tsunamis, and volcanoes.Mapping and long-term monitoring of vegetation are necessary to promote the conservation of biological diversity and ecosystem services.

Preparation of Reference Data
Based on the physiognomic characteristics, vegetation is classified into eight classes in the research: Evergreen Coniferous Forest, Evergreen Broadleaf Forest, Deciduous Coniferous Forest, Deciduous Broadleaf Forest, Shrubland, Arable Land, Herbaceous, and Non-vegetation.The classification scheme used in the research is described in Table 1.
In Japan, terrestrial vegetation has been surveying continuously since 1973.
The procedure of the vegetation survey involves field inspection of the unique vegetation types, and record of plant community types along with the geo-location points.The plant communities are classified by experts according to the association of vegetation-the diagnostic/dominant species occurrence in the uppermost (and understory) stratum.A lookup table was prepared for re-classification of the plant community types into physiognomy based types by studying physiognomic characteristics of all plant communities.The geo-location points were further verified to represent large homogenous (at least a single MODIS pixel size) areas using Google Earth based time-lapse images.Finally, for each physiognomic class, 300 geolocation points distributed over the Japan were confirmed and used as the reference data.

Processing of Satellite Data
Terra/Aqua satellite based MODIS Surface Reflectance 8-Day Level 3 Global 500 m data (MOD09A1 and MYD09A1) available from the United States Geological Survey (USGS) over Japan in year 2013 were processed and used in the research.
The MOD09A1 and MYD09A1 products provide an estimate of the surface Table 1.Vegetation physiognomic classification scheme used in the research.

Vegetation types Description
Evergreen Coniferous Forest Forests dominated by conifer trees that retain leaves throughout the year.
Evergreen Broadleaf Forest Forests dominated by broadleaf trees that retain leaves throughout the year.

Deciduous Coniferous Forest
Forests dominated by conifer trees that shed leaves seasonally.
Deciduous Broadleaf Forest Forests dominated by broadleaf trees that shed leaves seasonally.

Shrubland
Woody vegetation either evergreen or deciduous with less than 3 meters tall and more than 10% cover.
Arable Land Land used for cultivated crops and/or grasses for foods and/or animal feeds.

Herbaceous
Vegetation land covered by natural grasses or herbs with cover over 10%.

Non-vegetation
Water bodies: Surface water bodies such as river, lake, dams, and ponds.Urban built-up areas: Land modified by human activities including all kinds of impervious surface.Barren lands: Lands never covered by vegetation over 10% cover including permanent snows, sandy fields, and bare rocks.
spectral reflectance of bands 1 -7 (Red, Near Infrared, Blue, Green, Mid Infrared, Shortwave Infrared 1, and Shortwave Infrared 2) as it would be measured at ground level in the absence of atmospheric scattering or absorption.Only highest quality surface reflectance datasets were used by masking out the pixels affected by clouds, cloud shadows, cirrus, and large solar zenith angles using separate quality band descriptions available in the dataset.Three spectral indices: Normalized Difference Vegetation Index [23], Superfine Water Index [24], and Urban Built-up Index [25] were also calculated for each scene.2.

Mapping and Comparative Analysis
The mapping framework developed by previous study [26] suitable for largescale land use/cover mapping was adopted to produce 500 m resolution vegetation physiognomic map of year 2013 in Japan.This mapping framework automatically selects the optimum features, a set of lowest number of input features yielding highest kappa coefficient, for the given reference data and uses the Random Forests [27] based supervised classification model for the production of land use/cover map.The retrieval of the optimum features does not only select the best features required for discriminating the classes, but also reduces the computation time and efforts [26].Due to faster computation power, Random Forests based supervised classification method was adopted from the viewpoint of nationwide mapping which involves a large volume of data.Recently, researches using Random Forests classifier are growing rapidly for remote sensing applications [28]   Data Format (HDF) with sinusoidal projection system.For comparison, it was re-projected into the Geographical Coordinate System (GCS) and remapped according to the classification scheme described in Table 1.

Production of JpVP-500 Map
The 500 m resolution vegetation physiognomic map of year 2013 produced in the research is displayed in Figure 1.This map was produced by establishing the Random Forests model based on 75% reference point data prepared in the research and using 148 optimum features.The automated machine learning approach based resulted map did not require any post-editing.

Comparison to MCD12Q1
The resulted map was compared to the MCD12Q1 product of year 2013.Direct comparison between the resulted map and MCD12Q1 product is not possible due to different legends used.Therefore, the MCD12Q1 product was remapped according to the definitions used in our map as far as possible.The remapping procedure of the MCD12Q1 product is explained in Table 3.
The remapped vegetation physiognomic information extracted from the MCD12Q1 product is plotted in Figure 2. The comparison of the Figure 1 to

Validation Results
The performance of the resulted vegetation physiognomic map was assessed by computing the accuracy metrics: overall accuracy and kappa coefficient by using  Based on the reference point data prepared in the research, accuracy of the MCD12Q1 product was assessed.For this purpose, all 2400 reference point data were used.Our map separates the grasslands into herbaceous (natural grasslands) and arable (cultivated pastures), but puts the cultivated pastures and croplands into a single arable class.On the other hand, the MCD12Q1 product does not separate permanent wetlands into mangrove trees and herbaceous marshlands; whereas our map separates wetlands into herbaceous (marshlands) and mangroves (Evergreen Broadleaf Forest).Therefore, quantitative validation of the MCD12Q1 product was done by excluding the unmatched classes (grasslands, croplands, and wetlands).The performance of two maps are summarized in Table 4.
The overall accuracy (Kappa coefficient) calculated for MCD12Q1 product and our map are 0.32 (0.24) and 0.82 (0.79) respectively.The validation results showed that our map performed far better than the MCD12Q1 product.The

Conclusion
In The 8-day data containing surface reflectance (7 bands) and three spectral indices were composited using monthly and percentile based techniques.Multiple percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) and monthly median composites (January to December) were calculated pixel by pixel for each dataset.In addition, land surface slope data were prepared from 30 m resolution Shuttle Radar Topography Mission based Digital Terrain Elevation Data available from the USGS; and resampled into the MODIS pixel size (500 m).Altogether, 231 features (input layers) prepared and used for machine learning and mapping of the vegetation physiognomic types.The input features are described in Table schemes from 2001 to 2013 annually.In this research, International Geosphere-Biosphere Programme (IGBP) layer of the MCD12Q1 Version 5.1 product was

Figure 1 .
Figure 1.Nationwide vegetation physiognomic map of year 2013 produced through the research: (a) Display over the national territory, (b) Zoomed in over the black polygon region in (a).The national boundary is based on Global Administrative Areas database (GADM) version 2.8, Nov. 2015.

Figure 2
Figure 2 clearly shows a huge difference.Almost all vegetation types in Japan are simply classified as mixed forests by the MCD12Q1 product.Not only forests but also large areas of scrublands are also misclassified as mixed forests by the MCD12Q1 product.

Figure 2 .
Figure 2. Vegetation physiognomic map extracted from the MCD12Q1 product of year 2013: (a) Display over the national territory, (b) Zoomed in over the black polygon region in (a).
MCD12Q1 product is basically the land use/cover map which was not targeted solely for the vegetation physiognomic mapping.The reference point data prepared through a careful study of the physiognomic characteristics of the plant communities in Japan has resulted accurate vegetation physiognomic map in the research.The 17-class MCD12Q1product is based on globally distributed 1860 training points; whereas the preparation of 2400 training points only in Japan for the production of 8-class vegetation physiognomic map is very high.The physiognomy of the vegetation is so diverse in the world that it's hard to classify them merely using 1860 points.
this research, a rich-feature data exploited with the Random Forests based mapping framework provided reliable classification (Overall accuracy = 0.82, Kappa coefficient = 0.79) of the vegetation physiognomic types in Japan.The comparison of the resulted vegetation physiognomic map to MODIS Land Cover Type product (MCD12Q1) based on the reference data prepared in the research showed a huge difference.Most of the vegetation types in Japan are simply classified as mixed forests by the MCD12Q1 product.The validation results undermine the applicability of the MCD12Q1 product in terms of vegetation physiognomy in Japan, and highlights the possibility of improving the accuracy of the MCD12Q1 product with special focus on reference data.Although this research provided far better classification of the nationwide vegetation physiognomic types compared to the MCD12Q1 product, the classification accuracy may not be sufficient for tracking the vegetation physiognomic changes over the years.Therefore, further research especially on the inter-class discrimination of the vegetation physiognomic types is recommended to increase the classification accuracy.

Table 2 .
Description of the features used in the research.

Table 3 .
Remapping of the MCD12Q1 product.

Table 4 .
Performance of different vegetation physiognomic maps.