^{1}

^{*}

^{2}

^{2}

^{2}

^{3}

Gully erosion can account for significant volumes of sediment exiting agricultural landscapes, but is difficult to monitor and quantify its evolution with traditional surveying technology. Scientific investigations of gullies depend on accurate and detailed topographic information to understand and evaluate the complex interactions between field topography and gully evolution. Detailed terrain representations can be produced by new technologies such as terrestrial LiDAR systems. These systems are capable of collecting information with a wide range of ground point sampling densities as a result of operator controlled factors. Increasing point density results in richer datasets at a cost of increased time needed to complete field surveys. In large research watersheds, with hundreds of sites being monitored, data collection can become costly and time consuming. In this study, the effect of point sampling density on the capability to collect topographic information was investigated at individual gully scale. This was performed through the utilization of semi-variograms to produce overall guiding principles for multi-temporal gully surveys based on various levels of laser sampling points and relief variation (low, moderate, and high). Results indicated the existence of a point sampling density threshold that produces little or no additional topographic information when exceeded. A reduced dataset was created using the density thresholds and compared to the original dataset with no major discrepancy. Although variations in relief and soil roughness can lead to different point sampling density requirements, the outcome of this study serves as practical guidance for future field surveys of gully evolution and erosion.

In agricultural fields, gully erosion is significant and often similar to or exceeding sheet and rill erosion volume. A large number of modeling tools have been developed over the years to estimate sediment transport from agricultural fields to streams and lakes [1-3]. These tools play an important role in assessing existing and planned conservation practices and are accepted by various regulatory and management agencies such as the United States Environmental Protection Agency (EPA) and the United States Natural Resource Conservation Service (NRCS). However, at the current stage of development, sediment loss estimation from gullies is either limited or neglected. Unlike sheet and rill erosion, which occurs as a result of the impact of raindrops and water flowing on the soil surface, gully erosion in agricultural fields occurs as a result of concentrated flow of surface runoff along a defined channel, and also by subsurface flow by seepage and flow through preferential pathways [

Ephemeral gullies are defined as small channels located in agricultural fields eroded primarily from concentrated overland flow that can be easily filled by normal tillage, only to reform again in the same location by additional runoff events [

This dynamic behavior poses a challenge in the understanding and estimation of the soil erosion of ephemeral gullies in agricultural watersheds [

Recent developments in laser scanner technology, provides new opportunities for scientific investigation of ephemeral and classical gullies. Although, laser scanners have been previously used in similar investigations such as large-scale classical gullies in different locations such as mountain-side sites [

Ground-based LiDAR systems provide the tools for detailed multi-temporal analysis of micro-topography of ephemeral gullies. These systems are capable of generating terrain representations with sub-centimeter vertical accuracy (

LiDAR technology measures the laser pulse travel time from the transmitter to the target and back to the receiver [

These systems are capable of collecting information with a wide range of ground point sampling densities as a result of operator controlled factors such as the scan angles (area covered by individual scans), average point density of individual scans, and degree of overlap between scans (

Modeling tools designed to define and to understand the pathways of surface water movement across fields and even across watersheds rely on raster-based digital elevation models to derive the required topographic attributes [

Among all the available interpolation techniques, kriging is often used because of the ability to provide unbiased estimates with minimum and known variance [

As an illustration of the variogram computation concept, consider a one-dimensional set of laser points (

The semivariogram equation represents the average semi-variance for a lag distance h between points and a total number of pairs N_{h}. In this equation Z_{i} and Z_{(i + h)} represent the elevation value of points at location i and i plus the separation distance h (i + h), respectively. The smaller the γ value the more related the points are. In other words, the semivariogram represents the average squared difference of any pair of points located h distance from each other [

Plotting semi-variance using an increasing range of lag distances generates a semivariogram graphic (

The study site selected for this investigation is located within the Cheney Lake Reservoir watershed near the town of Hutchinson in South Central Kansas. The predominant land use is agriculture (>73%) in the form of cropland and rangeland. The gully within the study site was 96 meters long oriented North-South, approximately 1.3 meters wide and from 10 to 50 cm deep. The channel is free of vegetation and crop residues, while the surroundings are covered by crop residues resulting from no-till management used in winter wheat followed by sorghum (milo) in the 2010 crop rotation. Historical cultivation practices indicates that initially this ephemeral gully did not disrupt farming operations; however, as no-tillage practices were adopted in 2005, the channel grew wider and deeper to the point that the farming equipment could not be used to travel across the gully and the ensuing cropping activity was performed around the main channel (

Two locations with known geographic coordinates within the study site were used to provide reference geographical coordinates. This is an important step to translate the equipment local coordinates into geographic coordinates, thus providing a means to compare surveys performed at different times. The equipment used was a TOPCON GSL-1000 series and its general specifications are listed on

During data collection in the field it is possible to survey the same geographical location from different scans. This practice is often used to increase sampling density and to avoid problems such as shadowing and/or limited coverage due to vegetation. The overlap in LiDAR point sampling can also be used to evaluate survey accuracy. Given that the overlapping scans were collected with enough point density, it is possible to identify neighbor ing points, collected from different scans, and compute elevation differences between these points (

Cross-validation was performed considering two distance threshold values of 1 and 5 millimeters. Histograms of the elevation difference of the selected point pairs are shown in

Studies have been performed to identify the optimum

balance between point density at small gully scales and volume of data with the goal of optimizing data collection and cost. Guo [

Point sampling density was investigated by tiling the entire LiDAR point cloud into one-meter square grids. Sampling density was computed by counting the number of points in each tile. This information can be utilized when verifying spatial coverage of sampling points to identify gaps or under-sampled regions (

A total of 5,032 tiles were generated (many of them containing no points) (_{elev }= 0.13), 2177 (40,923 points with σ_{elev }= 0.06), and 2304 (17,144 points

with σ_{elev }= 0.01). The same variation in elevation represented by standard deviation values can be observed on histograms (

Experimental semivariograms for each of the three tiles selected were computed using the algorithm gamv available in the Geostatistical Software LIBrary (GSLIB) due to the irregularly spaced nature of the laser points [

The experimental semivariograms were computed using all the available laser points in each tile with the gamv algorithm (black dots in

Variations of the standard Gaussian model were used to generate theoretical semivariogram curves for the remaining two tiles 2304 (Equations (3)) and 2177 (Equation (4)). The three curves of the theoretical semivariograms are plotted in

Randomly selecting a subset of points for each tile and then evaluating their variogram was utilized to quantify of the influence of the sampling density on the topographic information. Large number of repetitions for the random creation of the subsets was adopted to minimize the odds, and possible influence, of one “bad” selection of points. A Monte Carlo type investigation was performed by creating a series of independent simulations of reduced datasets containing a smaller number of laser points than the original number in each tile. The reduced dataset was generated by randomly selecting laser points based on a pre-defined percentage. A percentage of 100% represents all the laser points available in the tile while a reduced set using a percentage of 50% would yield half of the available points in the tile. For each predefined percentage, a total of 100 independent realize-

tions were performed (100 independent randomly selected reduced sets). Each reduced set was used in the computation of experimental and theoretical semivariogram curves (

The theoretical semivariogram curves generated with reduced data points were quantitatively evaluated by individual comparison to the theoretical semivariogramcurve, obtained using all collected laser points, through the calculation of root mean squared deviation (RMSD) as shown in Equation (5).

In this equation, V_{100} represents the theoretical semivariogram curve developed using all available laser points, n is the total number of points in the curve (total number of lag interval considered), V_{P} represents theoretical semivariogram curves generated using a reduced dataset with percentage P. A total of 100 RMSD values for each percentage threshold were calculated and averaged. The resulting set of averaged RMSD values are graphically displayed in

The three curves display similar shape with the largest discontinuities found in the plot for tile 2304. Points representing the percentage of 10% and 8% yielded higher averaged RMSD values than the point with the lowest number of points (7%). This can be partially explained

by the procedure from which a reduced set was created. A standard random sampling technique was used, therefore, it is possible that selected points were not uniformly distributed throughout the tile (forming clusters) and as result, the theoretical variogram curve differs from the reference, yielding large RMSD values. Just a few realizations of clustered points could significantly increase the average value. Nonetheless, despite these two discontinuities it is possible to identify a general trend. The curves start with a gentle slope and as the number of points becomes smaller, curves tend towards to increase rapidly. In other words, results indicate that, in the scale considered, there is an upper threshold of point density where topographic information provided by the LiDAR point cloud does not increase (or increases very little) despite the increased point sampling density. Additionally, it can be observed a positive relationship between this minimum number of points and the tile standard deviation of elevation, as higher sampling densities are needed to topographically describe locations with higher relief, as expected.

To further evaluate the effect of point sampling on topographic information, these curves were used to select three threshold values to reduce the remaining tiles in the survey, 7500, 4000, and 3500 laser points per square meter from tile 2175, 2177, and 2304 respectively. A histogram of the standard deviation of elevation values was used to identify the quartile threshold values. Using these values, the number of laser points in a tile was reduced to the threshold of 7500 laser points per square meter if the standard deviation of elevation was ≥ 0.03617, to 4000 laser points per square meter if the standard deviation of elevation was <0.03617 and ≥ 0.0106, and to 3500 laser points per square meter if the standard deviation of elevation was ≤0.0106. A total of 25 tiles were reduced.

The two point clouds, original and reduced, were converted to Triangular Irregular Network (TIN) format to facilitate volume computations. TIN format was chosen over the conversion of the point cloud into a raster grid to minimize uncertainties caused by interpolation methods. A third TIN, with artificially filled channel, was created by manually digitizing the edges of the gully channel to form a polygon and then subsequent removal of all the laser points within the channel polygon. Through the use of differencing technique, the original and reduced TINs were subtracted from the artificially filled channel TIN yielding volumes estimate of 18.154 m^{3} and 18.146 m^{3} respectively. There is a difference of less than 0.04% between the two estimates. Additionally, visual comparison of the thalweg profiles for both datasets confirms the agreement between the original and reduced dataset (

This study used the concept of semivariogram to quanti-

tatively investigate the relationship between LiDAR point sampling density and topographic modeling needed to evaluate ephemeral and classic gullies in agricultural fields. The impact of gullies in agricultural fields can be studied at different scales, such as watershed, field, and individual gully scales. In this study, we addressed effects of point sampling density on the topographic information at the individual gully scale.

The gully investigated was partitioned into square meter tiles and the sampling density of each tile was computed by counting the number of laser points in each tile. This experiment revealed a large variation in LiDAR point sampling density throughout the gully. Tiles were ranked by standard deviation of elevation values and partitioned into three groups based on quartile of the histogram elevation values (representing three different topographic characteristics). The tile with the highest number of points in each group was selected for the sensitiveity analysis. Multiple realizations of subsets of randomly selected points at pre-defined percentages were used to identify the minimum point sampling density in which the data set retains the original spatial characteristic. Us- - ing the minimum number of points per square meter thresholds, a reduced point cloud dataset was developed and compared to the original dataset yielding not significant discrepancy. This indicates that data could be collected with smaller sampling density while retaining the original spatial characteristics.

At the fine sampling density required to proper characterize ephemeral gully evolution in agricultural fields, results indicate that well planned surveys could be designed to collect between 3500 to 7500 points per square meter based on the local terrain topographic variability. Such surveys, could significantly expedite data collection without loss of topographic information. It is also important to note that, although results indicated that the reduced dataset did not significantly differ from the originnal dataset in terms of topographic information, and thus these tiles could be considered over-sampled, the reduced tiles represent only a small percentage of the entire dataset. Out of 2085 tiles containing laser points, only 25 were reduced because they had originally more laser points than the defined thresholds. And, out of the 25 reduced tiles only 14 were located in and around the gully channel. Despite the oversampling of 14 tiles in and around the gully channel, still there are 175 tiles (out of 189) located in and around the gully channel that contained fewer points per square meter than the threshold values obtained as result of this study. This is an inherent consequence of the large variation in sampling density.

Although the ideal situation would be to survey gullies with the highest possible sampling density, this is often not practical because sampling density varies with factors such as resolution of the instrument, vertical scan angle, number of overlapping scans, and land coverage. Furthermore, scientific investigation to quantify and to understand the development of ephemeral and classic gullies in agricultural fields over time often requires multitemporal surveys of multiple locations throughout the watershed.

Based on the findings of this study, future field campaigns can be designed to generate consistent datasets with minimum point sampling density considering the different topographic characteristic (3500, 4000, 7500 laser points per square meter). During the field collection the laser scanner is mounted on a tripod that can be elevated allowing the possibility of collecting data far away from the nadir situation (large vertical angles). Collection of data with such large vertical angles leads to lower sampling densities and shadowing when investigating gullies with deep channels. One possible alternative would be to survey the same location using multiple overlapping scans each with lower point density. Although the instrument would be set to collect data at a lower point density, the combined set of scans would yield higher point density. Additionally, the overlapping dataset could be used to evaluate the point cloud by identifying pairs of points with high elevation difference what could be a potential cue to remove anomalies from the data cloud.

The use of ground-based LiDAR for ephemeral and classical gully investigations in agricultural fields is relatively new and research in this field is expected to continue to grow as technology becomes less expensive and new applications are developed. The use of such technology can help in collecting detailed micro-topography information that can be used in many different research areas such as ephemeral and classical gully modeling, soil water depressional storage capacity, terrain roughness measurements, and many others. Continuation of this work will investigate the influence of vegetation canopy and standing crop residue on the laser point sampling density and the derived topographic information.

The authors would like to acknowledge Don Seale for his indispensible assistance during data collection. Thanks are also due to Howard Miller and Lisa French, at Cheney Lake Watershed, Inc, for contacting and coordinating with local landowners and providing logistic support.