Uncertainty Analysis of Spatial Autocorrelation of Land-Use and LandCover Data within Pipestem Creek in North Dakota

A major threat to biodiversity in North Dakota is the conversion of forested land to cultivable land, especially those that act as riparian buffers. To reverse this trend of transformation, a validation and prediction model is necessary to assess the change. Spatial prediction within a Geographic Information System (GIS) using Kriging is a popular stochastic method. The objective of this study was to predict spatial and temporal transformation of a small agricultural watershed—Pipestem Creek in North Dakota; USA using satellite imagery from 1976 to 2015. To enhance the difference between forested land and non-forested land, a spectral transformation method—Tasseled-Cap’s Greenness Index (TCGI) was used. To study the spatial structure present in the imagery within the study period, semivariograms were generated. The Kriging prediction maps were post-classified using Remote Sensing techniques of change detection to obtain the direction and intensity of forest to non-forest change. TCGI generated higher values from 1976 to 2000 and it gradually reduced from 2000 to 2011 indicating loss of forested land.


Introduction
Sustainable use of riverine systems and riparian habitats are directly affected by changing land use patterns [1].Modeling land use patterns is an important technique for the projection of alternative pathways into the future [2] [3] [4].Geo-graphic Information Systems (GIS) combined with satellite remote sensing has varied application and has been recognized as a powerful tool in detecting land use and land cover change (LULC) [5] [6] [7].Satellite data is cost effective and the information obtained from them can be used as inputs to build land use and land cover datasets [8].Spatial representation of the LULC change is essential for regional planners and management [9].To elucidate the optimal use of land and to provide input data for watershed models, it is necessary to have information on existing LULC change patterns [10].
Geostatistics deals with problems pertaining to spatial serial data, mapping and interpolation of the data on a statistical platform that are related to a time analysis [11].It has an ability of distinguishing the continuous nature of LULC and is able to detect random variations during modeling, dependent on the spatial correlation within the ecosystem [12].Prediction using sample points is carried out by the spatial behavior and spatial distribution of parameters to minimize the error while doing any type of image classification [13].Inverse Distance Weighting and Splines are deterministic interpolation methods to analyze change in land use patterns but these methods tend to oversimplify the results, as the spatial autocorrelation of the data is not considered [14].A geostatistical method is usually preferred where sample data points can be transformed into continuous surfaces to understand the spatial autocorrelation within the data [15].The parameters used for any analysis can be aggregated from pixels to object class representation using image segmentation [16].
Kriging Interpolation is a very popular geostatistical method [17] [18].Ordinary Kriging estimates the mean as a constant in the searching neighborhood [19].The Kriging technique has recently become very common for analyzing spaceborne data [20].The values of unsampled locations are estimated by Kriging models using weighted averaging of the known sampled locations, which provide a correlation among the neighboring values that can be modelled as a function of the geographical distances between each location across the study area within the variogram [21].Global and local information in predictions can be obtained from Kriging, but the ability of the variogram in describing spatial dependence is a function of the quality and quantity of the data samples [21].According to a study by [22], exponential models are often best-fitted semivariogram models as they use the weighted least-squares method.[23] [24] and [25] introduced the semivariogram to remote sensing and discovered that the parameters of the variogram can be directly related to a feature in an image.The primary assumption of a geostatistical analysis when assuming spatial continuity is that samples that are located close to each other are similar than samples that are far apart [25].This variation in geographic data or the spatial relation can be analyzed from a semivariogram model.An ideal semivariogram has associated features such as the lag, nugget, range and sill.The direction and distance are commonly referred to as the lag, the nugget is variability at zero distance and represents sampling and analytical errors, the range of influence in a semivariogram designates the extent beyond which autocorrelation between sampling sites Journal of Geoscience and Environment Protection is very less or zero and the sill represents the variability of spatially independent samples [26].
An effective way of mapping vegetation and analyzing LULC change is using the Tasseled-Cap transformation [27]- [36].The Tasseled-Cap transformation is a conversion of the original bands of an image into a new set of bands with defined interpretations that are useful for vegetation mapping [37].The term Tasseled-Cap comes from the shape of the plot of the data that resembles a cap.A Tasseled-Cap transform is performed by taking "linear combinations" of the original image bands-similar in concept to principal components analysis [36] [37].Tasseled-Cap reduces the volume of the data without any loss of information and its spectral features are directly related to land features [38] [39] [40].This transformation in remote sensing is the conversion of the readings in a set of data into composite values that is the weighted sums of separate data readings [41].One of these weighted sums measures roughly the brightness or greenness of each pixel in the scene [41].[42] reported that the composite values are linear combinations of the values of the separate data readings, but some of the weights are negative and others are positive.The composite values represent the degree of greenness of the pixels or the degree of yellowness of vegetation or perhaps the wetness of the soil [42] [43].Usually there are just three composite variables listed within a remote sensing interface.The Tasseled-Cap transformation of Landsat thematic mapper (TM) consists of six multispectral features, all of which could be potentially differentiated in terms of stability and change in a multitemporal dataset [41] [42] [43].The first three features, which are brightness index, greenness index, and wetness index, respectively usually account for the most variation in a single-date image [44].[45] analyzed Landsat data for environmental studies and found the Tasseled Cap transformation to be a consistent indicator of assessing forest change as it captures Shortwave Infrared (SWIR).Tasseled-Cap's Greenness Index (TCGI) would ideally identify forest cover but it is less sensitive to any topographic effect [45].[46] led a study to distinguish old growth and mature forests in the Pacific Northwest using Landsat datasets.In their study, the Tasseled-Cap brightness index did not separate old growth and mature forests due to their sensitivity to topography but the greenness and the wetness index were able to identify forest disturbances.
The primary objectives of this study were: 1) to apply Ordinary Kriging Interpolation technique to smooth TCGI values, extracted from 30 m to 60 m spatial resolution Landsat images in order to analyze spatio-temporal transformations; 2) to apply change detection techniques to the interpolated prediction maps to yield the intensity of the LULC change.

Site Description
The Pipestem Creek watershed, 8-Digit Hydrologic Unit Code (HUC) (10160002) sub-basin is approximately 257,178 hectares covering parts of 4 counties (Foster, Kidder, Stutsman, and Wells) in the Missouri Region-James Sub-Region of North Dakota.Of the 257,178 hectares, Stutsman County contains 65%, Wells 22%, Foster 8%, and Kidder 5% [47] [48].North Dakota's land base is mostly a prairie land i.e. it mostly consists of grasslands [49].Most of the forests in this region are found scattered along the river bank or in urban patches [48] [49].
Pipestem Creek starts from the Pipestem Dam downstream to its confluence with the James River, which is about 9.01 kilometers [50].The mean annual precipitation is from 330 to 559 millimeters and mean elevation ranging from 390 to 780 meters.The type of soil found at this location is Williams-Bowbells loams which is a well-drained, non-saline clay loam with calcium carbonate of about 20% [51].Figure 1 shows the study area map and its location in North Dakota.

Image Processing
Image classification and processing was done on a remote sensing platform-ENVI®4.5.Six Landsat images (Table 1) covering the study site were downloaded from the Global Land Cover Facility [52].The images were acquired by

Geostatistical Analysis
To better understand the spatial structure of the imagery on a given date and location, an empirical semivariogram model such as Ordinary Kriging was applied to the sampled data.The six transformed datasets corresponding to the six years were imported to ArcMap® 10.4.1 where they were clipped to the watershed boundary shapefile.Geostatistical Analyst was used to perform Ordinary Krigingon each dataset for each model type: Gaussian, Exponential, J-Bessel, K-Bessel, Circular, and Spherical.To consider the model that best fitted the study, certain parameters were considered-1) Cross-validation scatter plot where the measured and predicted values were compared by using the difference between them, 2) Mean estimation error where the difference between the estimated and the known point values were considered, and 3) Mean standardized squared estimation error.These parameters were generated in ArcMap® 10.4.1 using Geostatistical Analysis.
Based on these parameters, among other Kriging models, the Exponential model was found to be the best fit for this study.A smoothing factor of 0.2 was applied to the search neighborhood type for all the datasets.The scatterplots derived from the Exponential model for each datasets is shown in Figure 2. The

Semivariogram Analysis
All six TCGI images were used for the geostatistical analysis.First, empirical semivariograms for the six periods were established.The rationale for using a semivariogram model was the similarity in spatial structure of most of its variables, gradually increasing or decreasing as a function of the increasing distance from the river until the boundary of the watershed, and the typical shape of the variogram.In the current case, the presence or absence of sill may be an indicator of presence or absence of forested areas.The level of sill may be related to the level of spatial correlation within the watershed.Therefore, if semi-variance reaches its maximum point (sill), beyond that, the data may not be correlated.The range may be the defining boundary of the watershed since it incorporates all the pixel values within the image that are correlated.
Results of the cross-validation analyses of the semivariogram models are presented in Tables 2-6.For an ideal model, the Mean prediction error should be near 0 (this investigates bias), Root Mean Square (RMS) prediction error should be small, average standard error should be close to RMS error, Mean-standardized prediction error should be near 0 and RMS standardized prediction error should be near 1, indicating that the estimated prediction uncertainty is consistent [59].In Table 2, the RMS ranged from 0.043 to 0.088, average standard error ranged from 0.036 to 0.084, which is not very close to the RMS values.
Mean-standardized prediction error values ranged from 0.041 to 0.057.RMS standardized prediction error values were greater than 1 for years 1976 to 1991, ranging from 1.690 to 1.721.This model did not prove the ability to reproduce the observed values accurately.In Table 3, the RMS ranged from 0.008 to 0.031,   3.
All semivariograms were processed with 20 lags of 1000 m each.Because of the irregular distribution of forests and non-forests, data values exactly separated by 1000 m could not be expected, thus the range of 1000 m to 20,000 m was selected.Lag values were determined by trial and error process to optimize the abovementioned criteria.Ordinary Kriging interpolation maps were produced based on the exponential models with the parameters presented in Table 7. Figure 2 shows the scatterplots generated for cross validation of the Exponential model.
These graphs show the spatial correlation of the data.Most of the data values lie along the line in the scatterplots indicating how closely related the data is.During ground truthing and field observations, no evidence was found to support an anisotropic pattern, as in [62] study, that may explain the direction of forest reduction or the direction in which the agricultural lands are increasing.So, isotropic distribution was assumed in all cases.

Conclusion
TCGI was selected to describe the spatial surface patterns since it produced the best contrast in terms of separability among the spectral indices.It produced the best contrast in terms of separability among all examined spectral indices.TCGI was able to compare between the different sensors with different spectral bands, as it reduced their different spectral bands to one normalized layer of TCGI values.The semivariance analysis was found to be a suitable method for gaining insight to the spatial structure present in the imagery for a given date and location.
The similarity between the shape of the semivariogram and the directional change of the environmental variables is a logical reason for using this method.
The Kriging interpolation technique using the Exponential Geostatistical Simulation model was used as a smoothing filter in which each pixel was being replaced with the solution for the semivariogram equation (exponential model in the current case) calculated from all other pixels in the image.As a result, it reduced spatial errors and fine scale variability and helped to better identify the transition from forest to non-forested areas.

Figure 1 .
Figure 1.Map of study area-Pipestem Creek watershed in North Dakota.
late several vegetation indices for each image subset.These include the Normalized Difference Vegetation Index (NDVI) and Tasseled-Cap indices.NDVI was calculated in ENVI®4.5 using the equation NDVI = (NIR − Red)/(NIR + Red) [54].Band 3 was used as red and band 4 was used as near IR to generated NDVI and the output datasets were saved as floating point data type.Tasseled-Cap was calculated in ENVI®4.5 using the Transform tool where the reflectance images of years 1976 to 2015 were used as inputs.The output image generated four bands-Brightness index, Greenness index (TCGI), Wetness index and a null or Nonindex.These individual bands were displayed and linked to acquire regions of interest (ROI) representing forested areas.30 training sites were acquired.Spectral separability analysis was performed using ROI Separability tool on NDVI and TCGI, incorporating mean and standard deviation values of extreme classesin each scene, to analyze the most suitable index for differentiating between forested areas and non-forested areas using methodologies of[55] and[56].The ROI statistical results displayed univariate statistics such as minimum value, maximum value, mean, standard deviation among other values.Since the resolution of MSS and TM/ETM+ are different, for the MSS image, a 3 × 3 pixels window was selected and for the TM/ETM+ images, 6 × 6 pixels window was selected.The window values were averaged to generate new pixel values using raster calculator in ArcMap® 10.4.1.Thus, the resolution of the images was reduced by factors of 3 and 6, to specifically fit the MSS and TM/ETM+ images respectively.

Figure 3
depicts the final results for the distribution of the TCGI values for the six periods.The dark-red areas in the images are related to forested areas.The surrounding light red and yellow belts represent a mixed zone where forested areas and non-forested areas overlay each other or create a stable spectral balance.The zone colored by blue tones is considered to be non-forested areas that include mostly agricultural land.Forested areas are more concentrated in the 1976 image and is gradually seen to reduce for the rest of the images through 2015.

Table 1 .
Landsat time series scenes used in the study.Path/row of the MSS image is listed according to Worldwide Reference System-1 (WRS-1) while those of TM and ETM+ are according to WRS-2.different sensors (MSS, TM, and ETM+) and were from six different time periods, as listed in Table1.The Landsat images were processed by applying a [53]k object subtraction and then converting the image digital numbers to reflectance values.Dark object subtraction was applied to remove shadows, scattering and electrical gains within the datasets, e.g.[53].This was done to obtain a sound quantitative analysis of the images.Reflectance values were used to calcu-

Table 4 .
Semivariogram parameters for Circular Geostatistical Simulation models fitting the Tasseled Cap Greenness Index (TCGI) products for the pipestem creek watershed (Nugget = 0; Lag = 1000 m).In Table5, the RMS ranged from 0.001 to 0.013, av-erage standard error ranged from 0.012 to 0.073, which is not very close to the RMS values.Mean-standardized prediction error values ranged from 0.001 to 0.013.RMS standardized prediction error values were greater than 1 for some datasets, ranging from 0.772 to 1.613.This model did not prove the ability to re-
651 Journal of Geoscience and Environment Protection