A Yield Mapping Procedure Based on Robust Fitting Paraboloid Cones on Moving Elliptical Neighborhoods and the Determination of Their Size Using a Robust Variogram

The yield map is generated by fitting the yield surface shape of yield monitor data mainly using paraboloid cones on floating neighborhoods. Each yield map value is determined by the fit of such a cone on an elliptical neighborhood that is wider across the harvest tracks than it is along them. The coefficients of regression for modeling the paraboloid cones and the scale parameter are estimated using robust weighted M-estimators where the weights decrease quadratically from 1 in the middle to zero at the border of the selected neighborhood. The robust way of estimating the model parameters supersedes a procedure for detecting outliers. For a given neighborhood shape, this yield mapping method is implemented by the Fortran program paraboloidmapping.exe, which can be downloaded from the web. The size of the selected neighborhood is considered appropriate if the variance of the yield map values equals the variance of the true yields, which is the difference between the variance of the raw yield data and the error variance of the yield monitor. It is estimated using a robust variogram on data that have not had the trend removed.


Introduction
The yield mapping method I extensively describe here follows Bachmaier [1] in many parts.It does not use filtering techniques to remove outlying yield measurements that are caused mainly by the monitoring process.According to Simbahan et al. [2], such errors include grain flow and other sensor errors (moisture, speed, swath width), errors due to geo-referencing and combine movement, operator errors and data processing errors (Shearer et al. [3], Blackmore and Moore [4]; Arslan and Colvin [5]).Steinmayr [6] gives a concise review of possible sources of error, their cause and impact and the corresponding filtering techniques (e.g.Thylén et al. [7,8], Noack et al. [9]).
The yield map values determined by my method does not depend on the removal of outliers, but from fitting the yield surface of the raw data using paraboloid cones on floating neighborhoods.By choosing these neighborhoods across the tracks wider than along them, the yield map can be adapted better to changes in yield along the tracks than across them.This is an advantage over other methods that often do not smooth sufficiently across the tracks.The parameters for modeling a paraboloid cone are estimated by robust weighted M-estimators (cf.Hampel et al. [10], Section 6.3) so that the influence of outliers is automatically restricted or completely annihilated.Therefore, discarding or downweighting values that deviate too much from a paraboloid yield surface around a neighborhood, as done by Bachmaier and Auernhammer [11], is not necessary.However, measurements that are wrong for technical reasons, such as values where the combine enters the harvest tracks, are assigned zero weight, which corresponds to removing them.The use of weights has the additional advantage that, contrary to all other filtering techniques in the literature mentioned above, there is no need to decide whether a measurement should be removed (weight 0) or not (weight 1).A measurement can be assigned any weight between 0 and 1, so the weights indicate how likely it is that the measurement is correct.In particular, there is no need to define limits for speed, moisture or swath width outside which the corresponding yield measurements should be discarded.The influence of a measurement on the yield map is greater, the greater its weight is.Only a measurement with weight zero has no influence at all, which corresponds to its being canceled.
The example in this paper uses a data set where the yields have been converted to dry matter yields.Time and header status were not measured.Therefore for every measurement the distance to preceding harvest paths was calculated.A distance that is considerably smaller than the cutting width of the combine indicates that the measurement cannot have come from a full swath, so it might be incorrect.The distance to preceding paths also downweights measurements because of end-path delays if the combine driver did not lift the header after harvesting a path.The GPS-points of the following low yield measurements are usually close to other harvest paths around the boundary of the field.I applied a smooth method of weighting to reduce the effect of such dubious measurements.

Modeling Planes and Paraboloid Cones
A paraboloid cone (Figure 1) is modeled for determining a yield map value at 0 0 ( , ) x y on a neighborhood around 0 0 ( , ) x y where only a measurement on 0 0 ( , ) x y can have full weight.The weights of other points within the selected neighborhood decrease smoothly to zero at the border of the selected neighborhood.These weights are called local weights, whereas those for downweighting dubious values are global weights.Fitting paraboloid cones is only meaningful if one assumes that the true unknown yield surface is smooth.From a mathematical point of view, it should be a twice continuously differentiable function of the Gauss-Krüger coordinates.Fields where any rectangles were cut out, as often occurs in agricultural trials, are to be excluded from consideration.
But what is the true yield at a point of the field?Imagine a point as a 1 cm  1 cm square.One could say that if there is a wheat ear at this point, the yield (in Mg ha -1 ) is tremendously large (as the area of 1 cm 2 is so small), whereas the yield is zero if there is no ear.Such a yield map would have a huge microscale variation, and that is not desirable.It is sensible that the true yield denotes the true average yield of a rectangle around this point, the rectangle to be harvested for measuring its yield with the monitoring process.Thus, the true yield surface is continuous.It has no microscale variation because points close to each other refer to almost the same area.
Every continuously differentiable function defined on a two-dimensional area can be approximated by a skewed plane if it is confined to a small environment.A paraboloid cone, however, approximates a twice continuously differentiable function better as it can also take account of hilltops or depressions.This allows a neighborhood that is considerably larger.Paraboloid cones arise from approximating smooth functions by a two-dimensional Taylor series that includes only linear and quadratic terms.Figure 1 shows some examples of what a paraboloid cone can look like.
The two-dimensional quadratic model for fitting paraboloid cones is where z i is the measured yield at the i th site ( , ) i i x y .For reasons of numeric stability, the coordinates x i and y i should be the mean adjusted Gauss-Krüger coordinates where GK x and GK y are the means of the Gauss-Krüger coordinates ,GK i x and ,GK i y over all points used for fitting the cone.The random variables i e in (1) denote the errors, which should be around zero.
In extrapolation or where there are few valid measurements around a point 0 0 ( , ) x y , the yield value should be obtained using a skewed plane rather than a paraboloid cone because the latter can lead to exaggerated values.A skewed plane is modeled by

Estimating the Coefficients of Regression by Redescending M-estimates
The regression coefficients are not estimated by classical least-squares because this method is not robust against outliers, which often occur in raw yield data.Instead, M-estimates are used.

The Concept of an M-estimate
To understand the concept of an M-estimate, consider first the simplest case of estimating a location parameter.
A property of the mean z is that the sum of deviations from it is zero: . This is demonstrated in Figure 2, where the bisecting line was moved back and forth until its deviations from the data points summed up to zero.Its intersection with the abscissa is the mean.Figure 2 shows that the outlier to the right causes the mean to move to the right.
This effect can be avoided by bounding the bisecting line, so that the influence of any outlier is also bounded.This is illustrated in Figure 3, where the bisector is replaced by a bounded function, ψ.
The influence of the outlier is reduced when solving  .The solution of this equation,  , which is the intersection of the ψ-function and the abszissa, is called an M-estimate. Figure 3 shows that if the largest value were shifted further to the right, it would have no influence, whereas the smallest value, which is not an outlier, retains its influence on the estimate.The latter would not apply if one computed a trimmed mean.An M-estimate whose ψ-function redescends to zero is called a redescending M-estimate, which removes the effect of large outliers completely.
Since the dispersion of the data is usually unknown, an M-estimate should be scale-invariant, which is so when a scale estimate is incorporated into the equation to be solved.The M-estimate for the location parameter is then defined by . The scale estimate  can also be determined as an M-estimate for scale if one simultaneously solves an additional equation for  .This will be done in the following, where M-estimates are expanded to a regression model.

Weighted M-estimates in the Regression Model
The objective is to estimate simultaneously the regression coefficients j  and the scale parameter  in (1) and (3) by robust weighted M-estimators.Thus, 1 p  equations have to be solved, where p is the number of regression coefficients ( = 3 p if a skewed plane is mod-eled, = 6 p if a paraboloid cone is modeled).The solutions ˆj  , = 1, , j p  , and  are the M-estimates for the unknown parameters j  and  .Following Huber [12] with slight changes for the scale estimate, the following system of equations is solved: where the three equations of the right column in (4) are omitted for skewed planes ( = 3 p ), and the residuals ˆi e result as: the weights i w will be defined in (23).The number (Satterthwaite [13]) in the last line of (4) refers to them.I call it the effective number of data points.The idea behind this can simply be explained for the mean as a special case of linear regression.The variance of the mean of independent random variables i Z with identical variance 2   is given by 2 / n  , but the variance of a weighted mean, * results in 2 / n   .Thus, n data weighted according to i w lead to the same variance reduction as n  unweighted data.If all weights were equal, = n n  .The effective number n  is greater the more the weights i w equal each other.It is never greater than n .Data points with a low weight, i w , such as those close to the border of the selected neighborhood, barely enlarge n  , unless there are no data points close to the point to be mapped.
The weighted M-estimates used for the regression and  coefficients, j  , are based on the  -function in (8) and the  -function in (9).Illustrated are these functions in Figure 4 and Figure 5.Both  and  consist of lines and parabolae only. is defined as 2 2 ( ) for < 0 for 0 0.9 ( 1.4) 1.15 for 0.9 < < 1.9 ( ) = 2.8 for 1.9 2.3 0.5( 3.3) for 2.3 < < 3.3 0 for 3.3 and  is defined as where is the expected value of 0 ( ) Z  for an (0,1) N distributed random variable Z , thus providing the scale parameter = 1


for the standard normal distribution.

An Iterative Procedure to Obtain Regression Coefficients and Scale Estimate
To obtain the regression coefficients ˆj  , = 1, , j p  , and the scale estimate  , the equation system in (4) is solved in the following three successive steps:

First step: starting values
The starting value for the column vector of regression coefficients is obtained with the method of weighted least-squares: where is a column vector and the ( ) (2) is the design matrix whose i th line ( = 1, , i n  ) consists of the line vector.
which is based on the mean adjusted Gauss-Krüger coordinates x i and y i in (2).The weight matrix W = diag(w) is a diagonal matrix, i.e. ( ) The starting value for the scale solution is with * i w according to (7).n  , which has been defined in (6), should be considerably greater than p .Since s is based on squared deviations, it is subject to outliers, so that the robustly defined  might be overestimated.However, large starting values for scale should be preferred as they counteract the danger of scale implosion (Huber [14]).

Second step: iteration based on mo ψ and χ
The classical starting values obtained in the first step,   11) and (0) ˆ= s  in ( 14), serve as the results of the iteration = 0 m in the second step.To avoid solutions that could abduct us from the bulk of the data, redescending M-estimates are not yet applied in the second step when running the iteration procedure, which will be described soon.Instead, the  -function  in ( 8) is made monotone by replacing its redescending parts by horizontal lines: The  -function remains unchanged, as it is already

Third step: iteration based on  and 
Finally, the same iteration procedure is run again, using the results of the second step as starting values (iteration = 0 m ) and applying the redescending  -function  in (8) instead of the monotone one in (15).The results of the last iteration,  ( 1) m  and ( 1)  ˆm   , are the numerical solutions,   and  , of the equation system in (4).

The iteration procedure
The iteration procedure follows Huber [14] or Huber [12], Section 7.8.However, contrary to Huber, the M-estimates ˆj  , = 1, , j p  , and  in (4) are weighted, and the definition of  slightly differs from that of Huber.Therefore, the pocedure needs to be changed accordingly.
It suffices to describe the (m + 1) th iteration on the basis of the results of the m th iteration ( 0 m  ): Based on the vector of regression coefficients ( )  ˆm j  of the m th iteration, the residuals are computed first: Then the  -estimate of iteration with * i w according to (7).Now, the residuals ( )  ˆm i e are used to compute the so-called Winsorized residuals: ˆ= ˆm , the difference vector is calculated by weighted least-squares and used to compute the new vector of coefficients of regression: According to Huber [12], p. 183, theoretical considerations suggest Empirical results showed that the factor q could be relaxed, but only because the  -functions used are normed such that (mo) ( ) = 1 ' x  on a large area around 0.
The iteration procedure ends if T jj

Global weights
The most important reason for omitting yield monitor data concerns start-path delays (see e.g.Thylén et al. [7], Simbahan et al. [2]).The first m measurements are given the global weight zero, which discards them; the (m + 1) th measured value is downweighted by a factor of 0.5 because of uncertainty as to whether this value is correct.The other values are then weighted fully in the context of delays.The size of m depends on the yield monitor.For the Claas Agrocom Quantimeter, which was used in the trial described here, = 5 m proved adequate.Another reason for downweighting values is if the GPS points of the current swath are close to a preceding neighboring swath.Such a small distance might arise from GPS errors or inaccuracies, but it can also be the consequence of a smaller current swath width.If the minimum distance to the preceding harvest tracks, D , is small, it is less likely that the measured yield comes from a full swath width.At present, a raw yield value is weighted fully (weight 1) only if sw 0.5 m D   , whereby sw means the "full" swath width, which is somewhat smaller than the combine's cutting width, cw, when the combine is steered by a man.If sw 2.5 m D   , the raw yield value gets weight zero.The weights in between are obtained by linear interpolation.This is illustrated in Figure 6.
In Bachmaier [1], the interpolation interval was (0.5 sw, 0.9 sw) instead of (sw 2.5 m, sw 0.5 m)   , whereby I was influenced by researchers who discarded yield measurements with 0.5 sw D  . I no longer defend this because the positioning system's accuracy, which is the main source of variation of D , does not depend on the combine's cutting width, cw, to which the full width of a swath, sw, is closely related.However, if positioning errors can be assumed to be approximately normally dis tributed, the left half of a Gaussian bell curve, , could give a still more suitable downweight function for sw D  .For, value of a harvest track, it is downweighted by a factor of 0.5, as mentioned above, and if its minimal distance from neighboring tracks, D , is sw 1 m  , it results in a weight factor of 0.75 (Figure 6).Finally, the yield at this point is given the global weight = 0.5 0.75 = 0.375 i W  .Other sources of error that result in dubious measurements, e.g.unusual values for moisture or speed, could be dealt with similarly.The greater the likelihood that the measurement is erroneous, the lower its weight is.

Local Weights Based an Elliptical Distance from 0 0 ( , )
x y To estimate the regression coefficients for the paraboloid cone or the plane at the point 0 0 ( , ) x y , all points ( , ) i i x y within an elliptical neighborhood see Figure 7 are considered.The local weight 1 holds only at 0 0 ( , ) x y , whereas the weight is zero at the boundary of the selected neighborhood.The transition in between is obtained by quadratic interpolation, because a local deviation corresponds to a position error, and errors are usually quadratically considered.In particular, the local weight function is defined as where across = r r denotes the radius of the neighborhood across the tracks and elliptic d is a distance measure between ( , ) i i x y and 0 0 ( , ) x y on the basis of an elliptical metric, which will be defined in (24).The weights elliptic ( ) w d are local because they depend on the distance from the fixed point 0 0 ( , ) x y , and they change if this point changes.
The global weights, i W , do not depend on 0 0 ( , ) x y .The final weights, i w , of the yield measurements at the points ( , ) i i x y within the neighborhood for determining the regression coefficients in (4) are the products of the global and local weights, so they also depend on 0 0 ( , ) x y : In Bachmaier [1] I used different local weight functions, w and v , and thus, different weights, i w and i v , in the equation system in (4), where the former served to estimate the regression coefficients j  by means of the  -function, and the latter were used to estimate  by means of the  -function.The weight function w decreased linearly from 1 to 0, whereas v downweighted more slowly (with the fourth power) to increase the effective number of data points with respect to the scale estimate, because higher moments -the scale estimate is a second moment -require more data to be efficient enough.However, since the weights i w , which were used to estimate the regression coefficients, decreased faster to 0, the paraboloid cone was less forced to adapt to yield values close to the neighborhood's border, and hence, larger residuals could arise that distort the scale estimate.Therefore, and for the sake of simplicity, I now use the same local weight function, the quadratically decreasing w in (22) (Figure 7), and thus the same weights i w for estimating both j  , = 1, , j p  , and  .

An Elliptical Neighborhood
Raw yield data often show surface textures along the harvest tracks that remain when the data are amended.In Bachmaier and Auernhammer [15] I showed that ordinary kriging of amended data did not smooth out the harvest tracks sufficiently.Therefore, the neighborhood on which the paraboloid yield surface is fitted should be wider across the harvest tracks than it is along them.This can be achieved by an elliptical neighborhood (Figure 8).
The butterfly neighborhood I used in Bachmaier [1] is more appropriate as a filtering technique (Bachmaier and Auernhammer [11]), because its length along the tracks is a little shorter at the current track than it is at the two adjacent tracks, and the assessment of a yield value (correct or not) should less be affected by values of the same track; they could be erroneous as well, for example if the swath has not its full width.
A single swath of wrong values can also distort a yield estimate if its influence on the paraboloid regression is too large.To avoid this, the neighborhood must be chosen large enough, and the tracks in the mid should have nearly the same sum of weights i w .This is a further reason why the chosen local weight function w defined on an elliptical neighborhood downweights more slowly than linearly.
The 'elliptical metric', elliptic d , is based on the ratio, a , of the 'radius' (half diameter) of the ellipse across the harvest track and the 'radius' along it: , can be calculated from neighboring GPS points on the same harvest tracks as follows:

Advantages and Disadvantages of Modeling Planes and Paraboloid Cones
The main advantage of modeling paraboloid cones over planes relates to yield peaks or depressions (see Figure 1).If the yield is fitted by planes, a depression is overestimated and a peak is underestimated, whereas a paraboloid cone fits optimally provided that the neighborhood is not too large.Nevertheless, modeling paraboloid cones can lead to exaggerated results in the case of extrapolation.This can occur at the beginning of a swath width and along the edges of a field, and extrapolation can be based on points far away from 0 0 ( , ) x y .Therefore, where the extrapolation is over a distance or where there are few measurements near to 0 0 ( , ) x y , the skewed plane should be used.

The Measure ear n f
The choice of model can be based on the ratio of the sum of weights within a narrower neighborhood around 0 0 ( , ) x y to the sum of weights within the entire neighborhood.A small ratio indicates that there are only a few valid measurements close to 0 0 ( , ) x y , so the desired fit is determined mainly by extrapolation.The paraboloid cone should be used only if this ratio exceeds a certain value.
The considerations that lead to the choice of an elliptical neighborhood also depend on a sufficiently large share of valid measurements within the nearer environment.If it is too small, a circular neighborhood should be used.
To ensure that the ratio changes smoothly, a function for a degree of membership in the narrower environment is used, which is 1 at the point 0 0 ( , ) x y .Likewise to the local weight function w in (22), it decreases quadratically to zero at the boundary of the neighborhood because the effect of a paraboloid extrapolation can also increase quadratically with the distance.This leads to the definition of the following ratio: where = (( , ), ( , )) The sums relate to the weights of all points ( , )   i i x y within an initial elliptic neighborhood of 0 0 ( , ) x y .An increase of its size according to the rule in (30) does not involve a new computation of near f .To form an idea of the limits at which to model paraboloid yield surfaces or at which to use an elliptical neighborhood, the ratio of the expected values of the nominator and the denominator of near f is helpful.Under the assumption of continuously and uniformly dispersed data points with global weight = 1 i W , this ratio does not depend on the radii of the ellipse, so it can be calculated for a circle with radius 1:

Determining the Proportion of the Radii of the Elliptical Neighborhood
Since monitor yield data often show surface textures along the harvest tracks, an elliptical neighborhood that is wider across the tracks than it is along them is preferred.The determination of an adequate radius proportion, across along = / a r r , with statistical methods is not afforded in this article.Yield data obtained with the Claas Agrocom Quantimeter suggest that = 2 a could be a good choice.However, if there are few values in the nearer environment of 0 0 ( , ) x y , so that near f in (26) is small, the neighborhood should not be shortened along the tracks, and a circular neighborhood, i.e. a local radius ratio of ( , ) 0 0 = 1 x y a , should be used.Yet to determine this near f , an initial neighborhood is already necessary.It is based on the `full' radius ratio, a , and on an initial radius across,min r , a proper value of which can be found using the method in Section 3. The initial radius along the tracks is then given by Further computations of near f are based on a neighborhood that already respects this local radius ratio, .This can enlarge the neighborhood, but it rarely occurs, unless corners of a field are harvested.The elliptical neighborhood with these radii, across,min r and along,min r , also serves as the initial neighborhood in (30), where the final neighborhood size is determined.Its shape, however, i.e., its radius ratio, ( , ) 0 0 x y a , remains unchanged, until the next point 0 0 ( , ) x y is mapped.

The Decision on the Model
Paraboloid cones are preferred, therefore, the limit at which to model it should be less than the ratio in (27).The transition from planes to paraboloid cones should be smooth to obtain a continuous yield map.In Bachmaier [1], where both the local weight function and the degree of membership in the narrower environment decrease linearly, the ratio of expected values of nominator and denominator of near f resulted in 0.5, and a transition interval of (0.35,0.45) has proved adequate.Considering that this ratio has resulted in 2 / 3 for weight and membership quadratically decreasing, a transition area of (0. The measure near,1 f , which it refers to, is based on the elliptical neighborhood with radii across,min r and along,min across,min ( , ) 0 0 = / x y r r a .Modeling only a horizontal plane (p = 1) for very small near,1 f has proved inadequate because areas with only a few measurements occur mainly at corners of a field, where the yield usually tends to become lower.Instead, the neighborhood size is increased until near f reaches a minimum value of 0.30, so that there are enough data to fit an appropriate skewed plane.

Determining the Neighborhood's Size
The neighborhood should also be enlarged if the effective number of data points, n  , is less than a minimum effective number, min n  .This often applies where the combine enters the harvest track because the corresponding measurements have a global weight of zero.The rule for increasing the neighborhood's radii is as follows: The proportion, ( , ) 0 0 x y a , of across r to along r , as well as the choice of model is no longer affected by this procedure, as these decisions have already been made on the basis of near,0 f and near,1 f .

The Method of Finding an Adequate Neighborhood Size by Variance Comparison
If the variance of the true unknown yields of the field were known, the yield map could be considered optimal if its variance equaled that of the true yields.If it is less than the variance of the true yields, the map is too smooth, if it is greater, the map is too detailed.The variance of the true yields is unknown.It needs to be estimated in a way that does not depend on how the yield map has been generated.The variance of the measured yields comprises the variance of the true yields and the error variance of the yield monitor.Hence, the required variance of the true unknown yields results in Var(true yields) = Var(measured yields) Var(errors). The error variance, Var(errors) , can be estimated by the nugget component of a variogram.The reason is that there is no microscale variation in the yields because, as defined in the introduction, any yield at a field point refers to the average yield on a rectangle to be harvested around it, so all the nugget effect arising in an empirical variogram reduces to the error variance, Var(errors) , of the yield monitor (Cressie [16], p. 59).However, outliers often distort the structure of a variogram, which makes the determination of this value by extrapolation impossible unless outliers are removed or the variances are estimated robustly.Therefore, later I will switch to robust versions of variances and variograms.Nevertheless, for a better understanding it is desirable that the reader be familiar with the usual variogram, so I will introduce it first.

The Variogram
The empirical variance of measured values i z can be given by the following two equivalent formualae; the latter is less well known.
is the number of summands in the latter formula, which expresses the variance as half the average of the squared differences between all pairs of values.Squared differences of yields usually increase if the spatial distance between the data points increases.This is expressed by a variogram, which gives the variance as a function of the separation distance, h , which is also called the lag.Empirical variances, ˆ( ) h  , depicted in a variogram are restricted to pairs of values that are separated by a given lag h , so they can be called variances of the yield data points at given separation distance h (Bachmaier and Backes [17]): This writing is as usual as misleading.One must be aware that the index i in this sum is not a counter of certain logged locations (mean adjusted GPS points), but a counter of pairs of such locations.A single location can occur in many pairs, so that, for example, the third logged location point, which was denoted by 3 3 ( , ) x y in Section 2, could now be indexed by 17 17 ( , ), ( , ), ( , ) x y .The index i counts all pairs of those locations that are separated by a vector h i whose length, h i , is approximately equal to the target length h, i.e.The total variance of the measured yields in (32) is split into variances at given separation distance in (33), which in turn can be used to obtain the total variance as a weighted mean of them: all all ( ) ( ) Var(measured yields) = ( ) where k h correspond to the target distances h in (33).They are the midpoints of the different classes, whereas k h are the averages of all distances within class k , i.e.

the averages of all
 .An analogue of the formula in (34) is needed to compute the total variance robustly.

A Robust Variogram for Estimating the Variance of the True Yields
The variogram estimator ˆ( ) h  in (33) is based on squared differences among data, so it is sensitive to outlying yield data points.A single outlying datum can distort the variogram estimates since it occurs in several paired comparisons over many or all lag intervals (Lark [18]).Moreover, the outlier does not affect the variogram values of all lag intervals equally; it can distort the shape of the variogram, which affects the determination of the nugget component by extrapolation.Such distortion can be diminished by robust estimates of the variogram.
Robust variogram estimates have been introduced by several authors (Cressie and Hawkins [19], Dowd [20], Genton [21]).Lark [18] compared them with regard to robustness and efficiency.Since the formula of variance decomposition in (31) holds exactly only when referring to classical variances, I use a scale M-estimator whose  -function equals the classical one, where about 95% of standard normally distributed data are located.This guarantees a high efficiency for distributions close to normal (Bachmaier [22]).Outside the interval [ 2,2]   the chosen -function begins to deviate, and from | | 4 x  it redescends until it reaches the value zero at | |= 7.5  x to exclude the effect of large outliers completely.It is illustrated in Figure 9 and defined by parabolae and straight lines: The robust variances, ˆ( ) h  , at a target separation distance, h , are then defined by the solution of with  as in Figure 9, where the index i again counts pairs.Note that the solution of (36) would correspond exactly to the classical variogram estimate ˆ( ) h  in Eq.
(33) if one replaced the  -function in Figure 9 by the classical -function, To compute the robust variogram for the yield monitor data according to (36), it was necessary to omit summands related to pairs ( , )

Estimating the Variance of the True Yields
The desired classical variance of the true unknown yields is now, analogous to (31), computed as the following difference of robust variances: which is the nugget component of the robust variogram, is determined by a weighted cubic extrapolation (Bachmaier [23]).The weights decrease linearly with increasing separation distance; in particular: Thus, the variance estimation in (37) depends only on robust variogram values.They are barely affected by outliers, so they underestimate the theoretical classical variogram values a little, as these also include the variance of the yield monitor, which might be large because of the outliers.But since the underestimation of the robust variances affects all variances in the variogram similarly, the difference in (37), which gives the desired variance of the true yields, is barely affected.Therefore, this estimation method worked well, as shown in the Monte Carlo results in Bachmaier [23].

Results for the Field Bandstauden in Zeilitzheim
In 2001 the winter wheat harvest of Bandstauden field in Zeilitzheim (Germany) was investigated.Using the measurements for moisture, the raw yield data measured by the Claas Agrocom Quantimeter with a data logging frequency of 0.2 Hz were converted to dry matter yields.Zero yields were omitted as were those with a missing value for moisture.Values with technical errors were assigned the global weight zero, which meant they were discarded also.
Figure 10 shows the yield monitor data in Mg ha -1 and their global weights, i W , respectively.Weights less than 1 arise from a small deviation from the preceding harvest path or from values at the beginning of a swath.Many neighboring swathes were harvested in the same direction, therefore, regions with invalid values are larger than usual.

Yield Maps Generated Using Different
Neighborhood Size

The Adequate Smoothness of the Yield Map
An adequate neighborhood size is obtained if the variance of the yield map equals that of the true yields.The  with a minimum along r of five times the swath width) and was increased until the minimum effective number of data reached min = 100 n  .This number needs to be adapted to the neighborhood size.It should approximately correspond to that n ~ which is expected if most measurements of the neighborhood are valid.It depends on the data logging frequency, on the swath width, and, of course, on the neighborhood size.

Discussion and Conclusions
The yield mapping method proposed in this article is based mainly on modeling paraboloid cones on moving elliptical neighborhoods.The method of determining the parameters of the model is robust, so that the detection of outliers was not necessary.Besides, the method refers to weights that decrease, like a paraboloid cone opening downwards, to zero at the neighborhood's border.This corresponds to a smooth transition from being fully considered to being not considered at all, so that a larger neighborhood is necessary, as the values close to the border do not have full weight.A robust variogram, computed independently of the yield mapping method, served to estimate the variance of the true unknown yield values.This provided a measure of how smooth the yield map would be if all values had been measured correctly.) and a minimum effective number of min = 100 n  the estimated yield map had the same variance as the true unknown yield map.
In 2004, an experiment was done on a part of the Lamprechtsfeld in Thalhausen (Germany) where measured reference values for yield data from two yield monitors were obtained; Data Vision Flowcontrol and Ag-Leader.The results based on butterfly neighborhoods suggest that, if one wants a yield map whose sum of squared deviations from the true unknown values is minimized, the neighborhood size should be greater than that of Figure 11(b).However, the yield map optimized under this criterion is much smoother than the map of the true reference values, i.e., its variance is too small.In Bachmaier et al. [24], where these results are published, the shape of the butterfly neighborhood was also optimized.
The method proposed in the present paper cannot be used to optimize the neighborhood's shape, but it gets along with yield monitor data only.It could already be seen from Figure 9(a).in Bachmaier [1] that a circular neighborhood did not sufficiently smooth out the yield data across the tracks; these or other lines cannot be recognized in the present Figure 11, which is based on an elliptical neighborhood that is twice as long ( 2 = a ) across the tracks than it is along them.Therefore, a ra-dius ratio of 2 = a might be a good choice.Currently, the proposed method is applied to a single yield monitor only (Claas Agrocom Quantimeter) with a data logging frequency of 0.2 Hz and one crop.It could also be applied to harvests from other crops or from combines equipped with other cutting widths and other yield monitors.This might lead to different results, however, the yield mapping rule proposed here would produce yield maps that are not too smooth, not too fine -textured and not subject to large outliers.The reason is that it requires at least a radius of across = 5 sw r and it does not depend on the swath width, sw, because the radii are expressed in multiples of it.This also guarantees robustness against exclusively erroneous measurements on one or two harvest tracks within the neighborhood.The method requires a minimum effective number of data, min = 100 n  .A yield monitor with a higher frequency, such as the AgLeader monitor, provides more data, so min = 100 n  would be reached within a smaller neighborhood.But these data are usually less accurate, so the neighborhood size should not decrease.What should be adapted to the yield monitor is the number, m , of zero -weights at the beginning of harvest tracks.It should increase with the frequency of the system and be adapted to the intensity of its smoothing algorithm.In the Fortran program paraboloidmapping.exe (Bachmaier [25]), where the yield mapping method is implemented, I currently use m = 5 for Claas Agrocom, for AgLeader because of its high frequency of 1 Hz.

Figure 2 .
Figure 2. Deviations from the mean.

Figure 3 .
Figure 3.  -values of deviations from an M-estimate  .
) if the iteration refers to the second step, and (mo) =   in (8) if it concerns the third step.Using the vector of these Winsorized residuals,

Figure 6 .
Figure 6.A weight factor W as a function of the minimum distance D to the preceding harvest tracks.= 0.3 C , it comes close to the function in Figure 6.The global weight, i W , is the product of the single weight factors.For example, if the measurement is the t ( 1) h m value of a harvest track, it is downweighted by a factor of 0.5, as mentioned above, and if its minimal distance from neighboring tracks, D , is sw 1 m  , it results in a weight factor of 0.75 (Figure6).Finally, the yield at this point is given the global weight = 0.5 0.75 = 0.375

Figure 7 .
Figure 7. Local weight functions w and v for regression coefficients and scale according to (4).

Figure 8 .
Figure 8. Contour lines of the weight function w in (22) on a map point's elliptical neighborhood with across 10sw r = and on the model, the radius across,min r of the ellipse remains unchanged, whereas the radius along the tracks is adapted to it: c is the class width of distances used for computing a single variogram value ( = 5 c m in the variogram in Section 4).( ) N h is the number of all these pairs.
are close to each other on the same harvest track.This ensures near independence of the remaining pairs ( , ) i j z z .Pairs where either i z or j z has a global weight of zero were canceled also.Such a robust variogram is shown in Figure 12.
The robust error variance at separation distance zero,

1 (
m,5 m) , = 2 k the class [5 m,10 m) , and so on.Analogously to (34), the robust variance of all measured values, is estimated as a weighted mean of all values depicted in the robust variogram:

Figure 11
Figure11shows three yield maps for different sizes of

Figure 10 .
Figure 10.Raw data i z in Mg ha -1 and their global weights i W .

Figure 11 .Figure 12 .
Figure 11.Yield maps (in Mg ha -1 ) for different sizes of an elliptical neighborhood with = 2 a .
For an elliptical neighborhood with at least across = 10 sw