Grassland Height Assessment by Satellite Images

Images collected by optical and radar satellite sensors represent a most viable solution for the extraction of biophysical parameters of the earth surface. The mid-resolution dataset acquired by Landsat and Sentinel satellites have recently become available free of charge for all users. At the same time, some software for image processing and GIS, like QGIS, R, and ImageJ, have reached a high level of maturity and a large community of users, thanks to their open source license. In this project, free satellite images and open source software have been used for the assessment of the grassland biomass. The overall goal is the enhancement of the statistics of grassland production and dried fodder for the animal breeding. Currently, the National Institute of Statistics collects this kind of dataset at the province level. The project consists in some “in situ” surveys in a specific site in central Italy and in the building of a regression model between the grassland heights and the corresponding radiometric values of the most relevant image bands.


Introduction
The project has been financed in 2014 by Eurostat under the theme 8.4-"Provide quality agriculture, fisheries and forestry statistics" with the title "Pilot studies to develop methodological improvements to agri-environmental statistics and statistics on grassland production".The main goal has been the assessment of grassland biomass by means of free satellite images.The proposed method has built upon the correlation existing between the image values, in different spectral bands, and the grassland height, strictly depending on the biomass of the relative vegetation classes [1]- [6].
The regression model needs to be calibrated by ground truth samples that should be collected almost simultaneously with the satellite acquisition date.So, the need of fast and accurate measurements campaigns is crucial for getting a high correlation between real grassland height and image values.

Study Area
During the planning phase, three areas have been identified for the collection of aly.It is a quite suitable site for the grassland production because of the lack of trees and shrubs over the flat plan, located at 1300 meters above sea level.The plain is the bottom of an ancient mountain lake, now dried up, and has a rectangular shape with an area of around 15 Km 2 .
Regarding vegetation variations, it is possible to consider the existence of 4 types of homogeneous areas (Figure 2 The pasture reaches its highest growth at the end of June, depending on the temperatures, after a suggestive blossom, and produces high quality bales of hay. The best period for the field surveys is between the end of June and the beginning of July, just before the cut of the grassland.Two missions have been carried out in the past two years (2015-16) during the same period of the year.

Sampling Methodology
The need to collect many samples for the grassland height in a limited time and with the highest possible accuracy, taking also into consideration the pixel dimension of the satellite images used for the regression model (see paragraph 5), has suggested to the use of the camera and to postprocess the images in order to derive the height with some image processing tools.
The assessment of the height has been executed with a white cardboard of 300 × 70 cm placed on the grass and used as a background for the photograph in order to derive just a linear section of the grass height (Figure 3).Every picture has been linked to the GPS position of the same sample point.
The high contrast of the green vegetation over the white background has helped in the post-processing of the photographs that has consisted in the extraction, rescaling to a standard width and measure of just the white area (c section in the Figure 4) in terms of number of pixels.The division of that area by the fixed width produces its mean height and, by difference with the total height of the cardboard, the mean height of the grass.[8].The software is written in Java and is available for Windows, Linux and Mac OS and OS X (https://imagej.nih.gov/ij/index.html).Some of the software tools used in the software for the process are the Colour Threshold, for the selection of the white pixels, and the Colour Pixel Counter, for the determination of their number.
In the 2015 survey a simpler brown cardboard has been used, but sometimes the colour separation and the extraction of grass has been difficult, above all in case of dry hay.Another lesson learnt during the first year of the project is to avoid measures in the second half of July, when the grass is usually cut and compacted in bales.
This method has allowed a very fast survey and a high number of samples collected during each day of measurements.88 samples have been acquired in a two-day campaign on June 2016 (Figure 5), while just 20 were the points surveyed in 2015 because of the partly mown pasture.

Satellite Images
After Currently two mid-resolution satellites acquire images from the earth surface with optical sensors and deliver them free of charge to all users: the Landsat 8 (Nasa) and the Sentinel-2 (Esa).The OLI sensor of Landsat 8 has replaced the ETM+ one mounted on Landsat 7 but maintains the same bands and the same ground resolution (30 meters) of the previous satellites, allowing in this way the time series analysis [9].
Sentinel-2A is one of the satellites of the Copernicus Esa observation programme and delivers images at a spatial resolution of 10 meters in visible-near infrared spectra.By March 2017 the new Sentinel-2B will join the Esa constellation, increasing in this way the revisiting frequency on the same area [10].
Radar images have also been considered for the correlation model: the Sentinel-1 satellites acquire polarized images in C-band with a spatial resolution of 20 meters (Interferometric Wide mode) [11].From the first results of the model it seems S-1 images are not so useful for the determination of grass height (see S1_VH and S1_VV in Table 1).Nevertheless, it has been planned to test the model with the high-resolution radar images of the Cosmo-Skymed Italian mission.
The values for every image and band have been extracted with the QGIS Zonal Stats tool (http://www.qgis.org/it/site)with a 20-meters buffer centered on each samples and by extracting basic statistics, like mean and standard deviation, from the overlapping pixel values.

Statistical Analysis
Descriptive statistics are used to describe the basic features of the data by means of some numeric indexes: mean, standard deviation (SD), standard error of the mean (SEM), coefficient of variation (CV), skewness and kurtosis.The univariate correlation has been considered too, with the evaluation of the Pearson correlation coefficients and the relative correlation matrix.Furthermore, a multiple linear regression model has been implemented, based on ordinary least squares method, that is one of the most frequently used statistical approaches to model the correspondence between spectral and field data [12].
The regression model consists of a dependent variable, the vegetation height (Hv) measured in the in-situ survey, and the independent variables, represented by the spectral radiance levels of the bands from Landsat 8 and Sentinel 1-2 satellites.
Twenty of the acquired 88 samples have not been considered in the model because of the non-homogeneity of the pixels values in the 20-meters buffer area around each sample.A resulting number of 68 points has been used to train the model.
The adjusted coefficient of determination (Adjusted R 2 ) and root mean square error (RMSE) were used to quantify the amount of variation explained by the developed relationships and the accuracy of the relationships [13].Higher Adjusted R-squared values correspond to lower RMSE values and a higher precision and accuracy of a model for predicting the response variable.
A backward elimination approaches based on Akaike's Information Criteria (AIC) was used to identify the model that provides the best description of the data using the smallest number of parameters [14].

Sentinel
Descriptive statistics and correlation for Sentinel image are shown in Table 1, Table 2 and Figure 6.A positive correlation exists between the S2_NIR and S1_VV while remains negative for all the others bands.
Table 3 shows the T-test between the most significant predictors for the model (Red, Green, NIR) and how they differ from each other.
The regression model with the least AIC is: 64.416 0.168 S2_ G 0.084 S2_R 0.010 S2_NIR The overall significance of the regression model has been estimated with the F-test (F-statistic: 39.9 on 3 and 64 degree of freedom, p-value: 1.165e−14) and the model shows an Adjusted R-square of 0.63.
The linearity of the function between the dependent and the independent variables has been verified with the graphic of residuals (y-axis) towards the foreseen values (x-axis), shown in Figure 7.The hypothesis of linearity seems reasonable because of the symmetric distribution of the points around the horizon-      The normality distribution of residuals has been verified with the QQ-plot of standardized residuals, where it's possible to see the points laying near the QQ line.
The multicollinearity between regressors has also been assessed by means of the Variance Inflation Factor (VIF), visible in Table 3.Its value seems exclude the redundancy between the explanatory variables.

Landsat 8
Descriptive statistics and the correlation matrix for Landsat 8 image have been evaluated as well (see Table 4, Table 5 and Figure 8).The correlation is similar to Sentinel: positive between the grass height and the NIR band while remains negative for all the other bands.Table 6 shows the coefficients obtained with the T-test.Figure 9 shows the residuals against fitted plot and the normal Q-Q plot of the standardized residuals for verifying the assumptions of a linear model in the same way as done in the previous paragraph.

Validation
The performances of the models are assessed using repeated k-fold cross-validation in order to obtain results that are independent of a particular partitioning.
In this approach, k randomly subsamples (k = 10), or folds, are derived.The model was fitted k times on the combined data of k-1 folds and tested on the data of the remaining fold by applying the fitted model to the test fold and calculating the RMSE [15].
To perform the validation the cv.lm function in the DAAG package was used (https://cran.r-project.org/web/packages/DAAG/index.html).

Comparison between Models
The adjusted R 2 coefficient obtained in the models is quite similar: 0.63 for the Sentinel-2 and 0.56 for the Landsat 8 (Table 7).Their difference seems depending above all on their spatial resolution and on the wider area covered by the

Result
The project results show a poor correlation (Adjusted R 2 ~ 0.6) between the VNIR bands and the grass height.Radar Sentinel-1 images, even polarized, seems not influenced by different vegetation biomasses of upland pastures, perhaps due to the limited grassland height.From the statistical results seems the higher grass has a bigger component of red while green and NIR decrease.
However, the sampling method permitted fast operational activities and has given good results.The model seems to work better for mid-height grassland (10 -20 cm), perhaps because in case of very low grass the images receive signals from other visible elements, like small stones, dry grass, and the terrain itself, while the higher grass doesn't change the same radiometric values acquired for mid-height samples.
The relation between the grassland height and the relative biomass depends on the vegetation type.In the study area, excluding the wetland present in the southern part of the plain, the vegetation is mainly composed by Caricetum and Festucetum.
The grass development in height and width is quite related with the biomass increase [16] [17] [18].In particular there is a linear relationship between biomass and grass layer height [19].Specific analysis have assessed a relation between biomass of vegetation and the height of pasture: 10 -12 cm of grass height correspond to 1200 to 1500 kg•ha −1 , considering dry grass.

Conclusions
The adjusted R 2 derived for Sentinel-2 and Landsat images is not so good as expected, but is quite near to the corresponding values obtained in similar studies.
The Sentinel-2 image has also been corrected for atmospheric effects by means of the sen2cor tool implemented in the ESA SNAP software, but returned similar results.
Possible improvements could be seen in the use of higher resolution optical Even if the first attempts to use the radar images have been unfruitful, the plan is the future use of the complex data of the radar hi-res images acquired by the Italian satellite mission Cosmo-Skymed.Hopefully, in this way it will be possible to measure the soil roughness caused by the grassland (height) and the number of bales of hay obtained after the mown (biomass).
): • areas with agricultural crops (grass meadows, barley and lentils) located in the nearby of Castelluccio village and below the slopes of Vettore Mt., corresponding to the class 2.1.1 (Non-irrigated arable land) of CORINE Land Cover 2012 map (http://www.eea.europa.eu/publications/COR0-landcover);• areas with natural grassland, equivalent to the class 3.2.1;• areas with pastures, that occupies the central portion of the plain, corresponding to the class 2.3.1 (Pastures); • bare rocks (3.3.2) or sparsely vegetated areas (3.3.3),located in the nearby of Vettore Mt.

Figure 3 .
Figure 3. Survey of a height of a section of grass.

Figure 4 .
Figure 4. Postprocessing of the pictures.

Figure 5 .
Figure 5. Location of the samples acquired on June 2016 (basemap: false color image from Sentinel-2).

Figure 6 .
Figure 6.Scatter plot matrix of the variables considered in the model (Sentinel).
The F-test reports similar results of the previous case (F-statistic: 29.63 on 3 and 64 degree of freedom, p-value: 3.94e−12) and the model shows an Adjusted R-square of 0.56.

Figure 8 .
Figure 8. Scatter plot matrix of the variables considered in the model (Landsat 8).
and radar images, for the check of the homogeneity of vegetation cover and the extraction of texture indexes.The analysis of the polarization of the Sentinel-1 image has not given the fruitful results, perhaps due to the wavelength of the C-SAR instrument (~5.5 cm), quite close to the height of grass.

Table 1 .
Descriptive statistics of Sentinel bands and vegetation height variable.

Table 2 .
Correlation matrix of Sentinel bands and vegetation height variable.

Table 3 .
Correlation matrix of Sentinel bands and vegetation height variable.

Table 4 .
Descriptive statistics of Landsat 8 bands and vegetation height variable.

Table 5 .
Correlation matrix of Landsat 8 bands and vegetation height variable.

Table 6 .
Correlation matrix of Landsat 8 bands and vegetation height variable.

Table 7 .
Comparison between models.
Landsat pixel.The sat acquisition dates (S2: 15/06/2016, L8: 26/06/2016) are anyway close to the 2016 survey and the grass height is almost the same.Estimated height depends in any case on three image bands: green, red and near infrared.The proportionality is inverse with the green band while remains direct with the red and NIR.