Machine Learning Mapping of Soil Apparent Electrical Conductivity on a Research Farm in Mississippi ()
1. Introduction
In agricultural fields, spatial variability of soil chemical and physical properties affects crop growth and productivity [1] [2] . Understanding soil spatial patterns enhance the producer’s ability to improve crop production. Various tools have been developed for growers to use for sensing the spatial variability of soils, including electromagnetic, electrochemical, mechanical, optical and radiometric, airflow, and acoustic and pneumatic sensors. On-the-go sensors that measure apparent electrical conductivity (electromagnetic) have gained popularity for measuring soil spatial patterns because they can cover large areas quickly. Researchers have used information from on-the-go apparent electrical conductivity (ECa) sensors as proxies for estimating other soil parameters. Electromagnetic induction and electric resistivity are standard approaches for measuring ECa [3] .
Researchers have tested various computer algorithms to map soil spatial patterns [4] . Standard algorithms include random Forest [5] [6] , support vector machine [5] [7] , Cubist [5] [8] , kriging [4] , k-nearest neighbors [4] , and artificial neural networks [5] . There has been no one fits all approach for using computer algorithms for digital soil mapping.
Over the years, geostatistical techniques have been popular for developing digital soil maps. It provides a statistically sound model for soil spatial variation, minimizes sampling bias, measures spatial autocorrelation, and provides an error layer. Nevertheless, several disadvantages exist in using geostatistical methods for soil mapping [9] [10] : the overall concept can be challenging to understand and implement for users that do not have a background in geostatistics, large datasets can be computationally intensive to analyze, the residuals must be normally distributed, stationary, and not affected by and change in direction. Geostatistical models can be affected by outliers, spatial data clustering, and data collection errors.
Computer algorithms based on the machine learning concept are classified as a method of data analysis and artificial intelligence. They are becoming a popular alternative to geostatistical methods because machine learning techniques make no assumptions about the data distribution, can process large datasets containing cross-correlated covariates as predictors, and can function with little intervention [10] [11] . Machine learning successes for digital soil mapping purposes include soil organic carbon concentration [8] [12] [13] and associated stocks [13] [14] , soil texture [15] [16] , pH [17] , cation exchange capacity [18] , nitrogen [18] [19] , phosphorus [19] [20] , potassium, calcium, and magnesium [20] , bulk density [21] , and soil pollutants [22] . Finally, machine learning models also can be challenging to interpret and visualize.
Commercial and open-source tools allow users to process data collected by on-the-go sensor systems. Open-source or freeware technology provides excellent opportunities for users to evaluate data at minimal or no cost for using the software. Open-source technology is growing and is supported by numerous communities worldwide. More research is needed on using open-source tools and machine learning technologies at the farm level for making management decisions. With its vast, diverse agricultural landscapes, growers in Mississippi may benefit from these technologies. In Mississippi, Fletcher [23] has demonstrated, using unsupervised machine learning in cluster analysis, that ECa spatial patterns are similar for at least five years. The study focused on keeping the data in its native point form and not deriving interpolated maps. The current study was conducted to build on the previous research initiative of using open-source software and machine-learning tools for digital mapping ECa.
The objective of this study was to evaluate machine learning as a tool for mapping ECa of soils located on a research farm in Mississippi, to determine if measurements collected at different depths affected the accuracy of the algorithm, and to determine if measurements collected from a field containing multiple soil types impacted the results. The study focused on using open-source software available to the public (Smart-Map) and using it to derive interpolated maps.
2. Materials and Methods
2.1. Study Site
The experiments were conducted at the United States Department of Agriculture, Agriculture Research Service Farm (−90.872157 Longitude, 33.446486 Latitude, elevation—38 m above sea level), near Stoneville, Mississippi, USA. The average precipitation and temperature were approximately 133 cm and 17.5˚C, respectively [24] . Two study sites were evaluated and were referred to as MF2 and MF9. MF2 was 4 ha and contained the following soil mapping units: Commerce very fine sandy loam, 0% to 2% slopes; Newellton silty clay, 0% to 2% slopes, occasionally flooded; Commerce silty clay loam, 0% to 2% slopes; and Tunica clay, 0% to 2% slopes [25] . MF9 was 2.0 ha and consisted of the following soil types: Sharkey clay, 0% to 2% slopes, and Tunica clay, 0% to 2% slopes [25] . Both sites were in a continuous soybean (Glycine max L.) and corn (Zea mays L.) rotation. The farm manager used standard agricultural practices of the area for irrigation, weed treatment, and fertilization.
2.2. Data Collection
ECa readings were collected from MF2 and MF9 with the Veris MSP 3 (Veris Technologies, Salina, KS, USA) on-the-go sensor system. The device used six coulters to collect shallow (0 - 30 cm) and deep (0 - 90 cm) ECa measurements as the tractor pulled it through the fields. Coulters two and five injected an electrical current into the soil; coulters three and four obtained the EC shallow readings; coulters one and six recorded the deep readings [26] . Data output was in millisiemens (mS) per meter. The location of each measurement was recorded in latitude and longitude coordinates (WGS84) with a Garmin global positioning system. It recorded location information when receiving differential global positioning data. A laptop (HP Pavillion TouchSmart Notebook, Windows 8.1) inside the tractor’s cab was used to log the readings of each measurement. The data were collected from MF2 on May 1, 2015, and MF9 on April 28, 2022, before the growing season.
2.3. Data Preprocessing
Each measurement was assigned an identification number, converted from longitude and latitude coordinates to the UTM coordinates (UTM 15N, WGS84) system, and cleaned (i.e., removal of negative values, duplicated x-y coordinates, and outliers). The data cleaning step resulted in 1573 and 741 sampling points for further processing at MF2 and MF9, respectively. For comparison purposes, the mean, median, standard deviation, and coefficient of variation were calculated for each variable. The preprocessing and summary statistics were completed with the QGIS software (3.22.8-Białowieża) [27] .
2.4. Data Analysis
The ECas and ECad readings were processed with Smart- Map [7] , an open-source QGIS plugin created to complete digital soil mapping. Its machine learning tools were used to evaluate spatial patterns of the ECas and ECad. The data were processed based on the protocol described in [7] : 1) loading the data into the plugin, 2) selecting a target variable to interpolate, 3) setting the grid size for the map, 4) choosing the machine learning interpolation method, 5) evaluating the model using cross-validation, and 6) creating the map based on model development. Support vector machine is the machine learning method offered by Smart-Map. The software automatically fits the hyperparameters required by support vector machines. It uses the radial base function kernel because it is non-linear and can be fitted to most data. The grid size used for the map interpolation was 8 m × 8 m.
The software automatically selected the x and y coordinates as covariables. The user has the option to add other covariables if needed. The only covariables used was the default x and y coordinates. Model accuracy was determined by leave-one-out cross-validation with root mean squared error and the coefficient of variation (R2) being the accuracy measures. Additionally, Moran’s I [28] statistic was offered as a means to determine the autocorrelation for a specific variable. The value ranges from −1 to +1. The closer the value is to 1, the more clustered the data values. Values going towards −1 were more dispersed, and values close to zero were random. Moran’s I was used to compare the autocorrelation of the ECas and ECad deep readings. Maps of the final predicted measurements were created to evaluate within-field spatial patterns of ECas and ECad. The maps were created with the QGIS software.
3. Results and Discussion
Summary statistics of the study sites are summarized in Table 1. For MF2 and MF9, the ECas mean, median, minimum, and maximum readings were less than ECad, mean, median, minimum, and maximum values, indicating a more conductive soil component in the lower soil depth. Similar trends in ECas and ECad summary statistics have been observed at this farm [23] . The ECas coefficient of variation of MF2 was greater than the ECad coefficient of variation. The opposite was observed at MF9.
Moran’s I statistics values for MF2 ECas, and ECad readings were greater than 0.90 (Table 2), indicating a statistically significant autocorrelation for ECas and Ead readings. The ECas readings at MF9 also had statistically significant autocorrelation but to a lesser extent when compared with MF2 results (Table 2).
Figure 1 shows ECas and ECad maps of MF2. The spatial patterns were similar between the ECas and ECad readings, and the transitioning of the soil to higher ECa values horizontally and vertically was evident in the map comparisons. For MF2, the lowest ECas values were observed in the southwest section of the field, whereas the higher ECas and ECad values occurred in the northern portion of the plot. For this dataset, moderate values were observed in the middle of the field. The leave-one-out cross-validation accuracy for model selection was higher for the ECas readings than for the ECad readings (Table 3).
Table 1. Summary statistics for study sites MF2 and MF9.
CV—Coefficient of variation.
Table 2. Moran’s I measurement of autocorrelation.
ECas—shallow apparent electrical conductivity readings; ECad—deep apparent electrical conductivity readings.
Table 3. Leave-one-out cross-validation results of model used to derive maps of the study sites.
RMSE—root mean square error, R2—coefficient of variation.
Figure 1. (a) Study site MF2, sampling points, and soil mapping units, (b) apparent electrical conductivity shallow (ECS) readings, and (c) apparent electrical conductivity deep (ECD) readings. Ch—Commerce silty clay loam, 0% to 2% slopes; Cn—Commerce very fine sandy loam, 0% to 2% slopes; Ng—Newellton silty clay, 0% to 2% slopes, occasionally flooded; and Ta—Tunica clay, 0% to 2% slopes.
Figure 2 illustrated the apparent electrical conductivity readings at MF9. The ECas readings were more variable and showed weaker patterns than the ECad readings. A distinct pattern was observed in the ECad readings with the highest readings occurring in the middle and eastern sections of the field. Pereira et al. [7] indicated that the plugin was not a one-fits-all soil mapping software. Khaledian and Miller’s [5] review of machine learning tools for digital soil mapping also stressed that there is no ideal protocol for developing models for digital soil mapping. The ECas results at MF9 support that concept; however, future research must be conducted to determine why higher accuracies could not be achieved for model development at MF9.
Furthermore, another type of predictor may have been better for interpolating the ECa layers, such as kriging. For example, Veronesi and Schillaci [4] showed in a comparison study of kriging to machine learning algorithms to predict topsoil organic carbon that ordinary and universal kriging were the best predictors, followed by random forest. According to a review by [10] , random forest is the most popular machine learning tool used for regression purposes related to digital soil mapping. The Smart- Map tool does offer an option to complete kriging; thus, it will be explored in future research studies. The software uses support vector machine for the machine learning approach, which has been used less than other machine learning tools for digital soil mapping studies [10] [29] .
Strong spatial contrasts were evident at MF2 compared to MF9. The results also indicated the strength of the patterns was depth dependent on the fields studied. Generally, the lower ECa values at MF2 occurred in areas consisting of
Figure 2. (a) Study site MF9, sampling points, and soil mapping units, (b) apparent electrical conductivity shallow (ECS) readings, and (c) apparent electrical conductivity deep (ECD) readings. Sb—Sharkey clay, 0% to 2% slopes; and Ta—Tunica clay, 0% to 2% slopes.
silty clay loam, very fine sandy loam, and silty clays soils based on soil survey results (Figure 2). Additionally, this field was irrigated from south to north, which could have contributed to the higher ECa values observed in the northern section of the field. Fletcher [23] has also observed similar results based on cluster analysis of a field at the same farm.
This study was conducted on a research farm with 5 - 10 ha plots. MF2 was approximately double the size of MF9, which probably led to better spatial patterns observed in the former compared with the latter. To improve the mapping of the ECas layer for MF9, the distance between the mapping transects may need to be decreased from the 8 m to possibly 4 m. Also, the transects were collected along the rows for each field. Collecting the transects along and across rows should also be explored. However, that change would result in more time to collect and analyze the data. Finally, it is essential to determine what sampling design is optimal for machine learning tools used in digital soil mapping [9] [10] . Model choice and sample design can influence final outputs.
4. Conclusion
The research results indicated that machine learning was valuable for deriving ECas maps in two Mississippi fields located on a research farm. Open-source software and machine learning based on support vector machine was used to derive the maps. Autocorrelation of ECas and ECad measurements was site-specific. Location and depth played a role in the machine learner’s ability to derive the maps. Overall spatial patterns in ECa were evident in both fields; these maps can aid in developing strategies to collect soil and plant samples. Future research will focus on using the other tools provided by the software to establish management zones for the fields located at the research facility, evaluate the effect of sample design on machine learning tools, and compare different algorithms at the field scale.
Acknowledgements
The author thanks Milton Gaston, Jr., for his assistance in collecting the apparent electrical conductivity data. This research was partly supported by the United States Department of Agriculture, Agricultural Research Service. The findings and conclusions in this publication are those of the author and should not be construed to represent any official United States Department of Agriculture or United States Government policy.