A Large Scale GIS Geodatabase of Soil Parameters Supporting the Modeling of Conservation Practice Alternatives in the United States

Water quality modeling requires across-scale support of combined digital soil elements and simulation parameters. This paper presents the unprecedented development of a large spatial scale (1:250,000) ArcGIS geodatabase coverage designed as a functional repository of soil-parameters for modeling and comparison of water quality outcomes in the United States. The set of target models include: SWAT (Soil and Water Assessment Tool), APEX (Agricultural Policy Environmental Extender), and ALMANAC (Agricultural Land Management Alternatives with Numerical Assessment Criteria). This development relies on the Digital General Soil Map (DGSM) as the source of soil information, and leverages on architectural design and associated tools created for a companion product at higher resolution from which also was extended a procedure for refilling a large number of missing derived parameters. Outlined by regional watershed layouts and supported by GIS land use layers, the core product is developed using the File Geodatabase (FGDB) data structure, which brings, via customized Python-based tools, the data directly into geoprocessing workflows. The FGDB implement efficiently stores spatial soil features, tabular model elements and linked relationships, while seamlessly providing the environment for the extraction, spatial analysis, and mapping of the models’ parameters. As an alternative, the composing spatial elements, polygons and multi-resolution rasters, and the models’ elements are offered as a file-folder system of data with completely Open Source formats. Finally, this geographic database coverage provides support for the traditional large-scale and harmonized application of the models as well as an alternative to the higher resolution companion for areas where this information is still under development. How to cite this paper: Di Luzio, M., White, M.J., Arnold, J.G., Williams, J.R. and Kiniry, J.R. (2017) A Large Scale GIS Geodatabase of Soil Parameters Supporting the Modeling of Conservation Practice Alternatives in the United States. Journal of Geographic Information System, 9, 267278. https://doi.org/10.4236/jgis.2017.93016 Received: April 17, 2017 Accepted: June 3, 2017 Published: June 6, 2017 Copyright © 2017 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/


Introduction
Modern hydrology-based simulation models require the availability of representative key landscape parameters stored in adequate Geographic Information System (GIS) databases. Soil-related model parameters are traditionally derived from digital records of field-surveys.
In the United States, the most detailed source of such information is provided in extended area of the country by the Soil Survey Geographic Database (SSURGO) [1]. SSURGO is a Taxonomy-based, nationwide digital spatial database developed by the United States Department of Agriculture-Natural Resources Conservation Service (USDA-NRCS) at a range of scales between 1:12,000 and 1:24,000. Derived parameters have been extensively used to provide inputs to various hydrologic models [2] [3] including agricultural hydrology simulation models [4]. The spatially seamless application of SSURGO-based data is currently hindered by its partial incompleteness. In fact, the process of soil survey data collection and seamless completion is intrinsically lengthy and complex. This process could have been delayed, since USDA-NRCS collects, stores, maintains, and distributes soil survey information preferably for privately owned lands. Nevertheless, the development of SSURGO is continuously growing and the publication status updated and shared on line [5]. A basic remedy to the lack of information within incomplete areas is provided by the usage of large-scale source of information. This approach applied to agricultural hydrology models on watersheds and large geographic domains, provides controversial simulation results when compared to those obtained with higher resolution information [4] [6] [7] [8] [9] [10]. Large-scale soil attributes, however, have been successfully applied in hydrology in a large number of studies, and the value and usage of large scale soil data is still considered relevant [11]. It is important to notice that most of these applications were developed using dated data sources, such as the State Soil Geographic (STATSGO) [12], and methods to derive soil parameters for hydrology applications. Generally, there is a deficiency of up-to-date, documented, and functional GIS-based repositories of large scale modeling parameters for agricultural hydrology models.
In this paper we introduce the development and maintenance of a geodatabase coverage built to fulfill these purposes and therefore provide a repository of large scale spatial features and soil parameters for a set of agricultural hydrology models (SWAT, APEX, and ALMANAC). The core geodatabase is here named US-ModSoilParms-TEMPLE250000.
The applied approach is based on the application of a GIS-based data processing workflow to a selected collection of source spatial information. The overall procedure resembles and extends the development accomplished at the higher resolution [13]. Fundamental differences from such development include the source input data (Section 2.1.1) and the adapted methodology of filling the source data gaps (Section 2.3.2). The overall framework is outlined in Figure 1 and the following sections.
In the first section we present the characteristics of the implemented source data, models, GIS features and code. In the following section we present the results, and in the final section we discuss the highlights.

Digital General Soil Map
The USDA-NRCS National Cooperative Soil Survey (NCSS) developed the Digital General Soil Map (DGSM), or STATSGO2 [14], as a Soil-Taxonomy indexed representation of soil patterns in the landscape. DGSM is properly mapped at 1:250,000 scale in the continental U.S. (CONUS), Hawaii, Puerto Rico, and the Virgin Islands and 1:1,000,000 in Alaska. DGSM supersedes the State Soil Geographic (STATSGO) dataset, which included a limited number of soil attributes and outdated spatial features. DGSM includes a broad-based inventory of soils and no-soil areas designed for general planning and management uses covering state, regional, and multi-state areas. Data are distributed in the same packaging format and attributes of the current SSURGO data, which include both spatial and tabular data. Spatial data are delivered in ESRI shape file format and the World Geodetic System 1984 (WGS84) geographic coordinate system. Tabular data are in ASCII text files and pipe delimited fields. Spatial features outline soil general association units or Map Units (MUs), which refer to non-geo-referenced sub-unit groups (soil components, COMPs) accounted as a percentage of the area of the respective MU. Tabular data are logically linked to the spatial features and report physical and chemical soil properties as range and representative values. Information from seven (7)  attributes were used in this work, namely: 1) Legend; 2) Mapunit; 3) Component; 4) Chorizon; 5) Chfrags; 6) Chtexturegrp; and 7) Muaggatt. DGSM was downloaded as a single seamless national spatial and tabular dataset from the Internet at http://websoilsurvey.sc.egov.usda.gov.

Supporting Spatial Layers
The National Watershed Boundary Dataset (WBD) [15] was implemented to define the topographically-based hydrologic unit boundaries characterizing the domains of surface water flow. The WBD features used in this work include: a) Boundaries of 21 Regions (identified by 2-digit numbers): Regions 01-18 compose the CONUS, Alaska (19), Hawaii (20), Caribbean (21), whereas the South Pacific Islands (Region 22) are not covered by the DGSM layer; and b) A total of 2297 sub-basins identified by 8-digit numbers. The entire WBD GIS dataset was obtained from http://www.nrcs.usda.gov/wps/portal/nrcs/main/national/water/watersheds/data set. Land Use Land Cover (LULC) spatial layers were used to build local spatial statistics (MU level) and bias originally surveyed parameters when these most likely evolved, since the original collection date (e.g. Organic Matter). Cropland Data Layer (CDL) is a land cover product with more than one hundred (133) classes, 30 m resolution raster-based grid spanning the CONUS, with agricultural cover types in fine detail and with the remaining classes in less detail [16] [17]. These data sets were obtained from the NASS (National Agriculture Statistics Service) data server at http://nassgeodata.gmu.edu/CropScape along with the Cultivated Layer (CL), which explicitly distinguishes and reviewed the cultivated from non-cultivated land. The National Land Cover Data Set (NLCD) for the year 2001 [18], is a 16-class (additional four classes are used only in Alaska) land cover classification at a spatial resolution of 30 m obtained from the Multi-Resolution Land Characteristics Consortium (MRLC) at www.mrlc.gov to characterize the land use land cover in areas outside the CONUS, such as regions 19-21.

Models
The set of agricultural-hydrology simulation models include: 1) SWAT (Soil and Water Assessment Tool) model [19] designed for river basin and watershed hydrology simulation of water, sediment, nutrient, pesticide and fecal bacteria yields in agriculture-dominated landscapes and draining channels; 2) APEX (Agricultural Policy Environmental EXtender) [20], is designed for field-and farm-scale simulation of all the basic hydrological and chemical processes of farming systems and their interactions; and 3) ALMANAC (Agricultural Land Management Alternatives with Numerical Assessment Criteria) [21] is designed for field-scale simulation of the crop growth of a wide range of plant species and their competition. Commonly, these models require two types of input parameters: the first one (component level) represents the soil as a whole, while the second one depicts the soil across the vertical profile (layer level).

Geodatabase and Python
The ESRI ArcGIS File Geodatabase (FGDB) [22] version 10.1 provided the capability to handle and optimize the performances of the hosting data sets, while reducing the feature geometry and raster storage when compared to traditional shape files and personal geodatabases. Python language version 2.7 [23] and the ArcPy module provided by ArcGIS were utilized to access and operate the built-in geoprocessing routines and other tools offered by the Spatial Analyst extension [24] and ArcGIS 10.1. In this way, the compatibility with all the later versions was preserved.

Gap Filling
The companion development at high resolution identified a relatively large number of voids in the source data, which resulted in a large number of gaps in the compiled database records [13]. The procedure allowed the provision of an indexed set of scored-replacement parameters for the three models (SWAT, APEX, and ALMANAC) at the component and layer level.
At the first level, this was accomplished using a hierarchically-based methodology leveraging upon the Soil Taxonomy information and the geographic locations of the gaps. Texture-based replacement records were constructed and provided replacement at the layer level. In addition, proper default parameter records were consolidated for components referring to non-soil categories (e.g. badland, gullied land, lava flow, pits, and water). The overall set of replacements composed a database of Soil Taxonomy and Soil Texture indexed High Resolution Representative Values. This database was used to fill in the models' parameter gaps derived from original gaps contained in the source DGSM information.
The representative value (highest-scored) of each missing model parameter was retrieved by matching: a) the available Soil Taxonomy attribute from DGSM in a down-top search across the Soil Taxonomy-organized database (component level parameter); and b) the available Texture attribute (layer level parameter).

Results
The application of the procedure outlined in section 2.3.2 refilled the total number of parameter voids shown for each model in Table 1.
This step led to a spatial and tabular seamless outcome, which is provided in three means:   The model attributes were stored using dBASE tables. The complete system occupies a 24 GB storage volume.

Discussion
In this work, a geoprocessing work flow, previously developed using soil survey   . Spatial layers, models' parameters and GeoTEMPLE tools at the user's fingertip within the US-ModSoil Parms-TEMPLE250000 geodatabase.
regional-tiled geodatabases. This structure is advantageous for the quick interactive applications and or/analysis at the geographic extent of the entire CONUS.
A simple example is shown in Figure 7 for the top-soil Bulk Density parameter of SWAT, but any model parameter, both at the component and layer level, can be conveniently mapped and its distribution immediately evaluated and/or exported for further analysis and/or geoprocessing.
The third option offers to the Open Source software community accessibility within US-ModSoilParms-TEMPLE250000. Indeed, ESRI's software provides to programming languages such as Python and R (http://www.r-project.org) the capability to access and edit the FGDBs using ArcGIS site-packages (e.g. ArcPy and Bridge). However, the companion folder-based database framework developed using Shape files, GeoTIFF rasters, and dBASE tables, provides a comparable yet with expanded storage, offering direct access to the core content of this development.

Conclusion
Our work provides an unprecedented, large spatial scale, seamless and functional geographic database repository of soil parameters for three widely-used agriculture-hydrology simulation models in the United States. The data, assembled in three different fashions, along with customized tools, User Guide and details of this development, are planned to be available and continuously updated at http://soilandwaterhub.org/GeoTEMPLE.