Estimation of Location Parameter from Two Biased Samples

We consider a problem of estimating an unknown location parameter from two biased samples. The biases and scale parameters of the samples are not known as well. A class of non-linear estimators is suggested and studied based on the fuzzy set ideas. The new estimators are compared to the traditional statistical estimators by analyzing the asymptotical bias and carrying out Monte Carlo simulations.


Introduction
The problem is to estimate an unknown scalar parameter  from two different independent samples of size n 1 and , where j b and j  are the bias and scale parameter of the j-th sample respectively, j = 1, 2, and i  , i  are zero mean independent random noises.A novelty in our set up is that the biases j b , j = 1, 2 are assumed to be unknown which makes  unidentifiable from the classical statistics viewpoint, e.g.[1].Thus, traditional formulations of best estimation are not applicable in this situation which nevertheless often arises in applications.An important example is an assimilation problem in physical oceanography and meteorology where information on a certain parameter comes from both observations and a circulation model [2,3].Typically such a model is biased for a variety reasons: uncertainties in a forcing and dissipation, boundary conditions, model parameters, etc.The bias in observations is mostly due to inaccurate measurements and time/space averaging intrinsically present in any measuring procedure.That type of bias sometimes can be excluded using a learning samples [4,5], however it is often difficult to justify the key assumption that learning and control samples are taken from the same ensemble.
One can simply ignore the biases and apply traditional least square or maximum likelihood methods which would result in a biased estimate of  .As an alternative, we suggest to use fuzzy set (possibility theory) ideas [6,7] to construct non-linear estimators for  diminishing the bias comparing to the aforementioned approach.With biased observations the focus is naturally shifted from the variance of an estimator, which can be arbitrarily reduced by increasing a sample size, to its asymptotical bias.More exactly, in the traditional representation of the squared standard error (SE) our primary point of interest is the first term.Thus, we start with analyzing the asymptotical bias of the suggested estimators and then compare it to that of estimators traditionally used in statistics such as weighted mean or weighted median.Then, SE is addressed for small samples via Monte Carlo simulations when the second term is not negligible.Worthy noting that in the simplest situation with unbiased observations 0 j b  and known  's the unbiased estimator with the smallest SE (least square estimate) is given by the weighted mean, e.g.[8] where j x is the sample mean of the j-th sample.Moreover, with normal noises it is the maximum likelihood estimator [1].
In the general formulation (1) with biased observations a choice of an appropriate measure of the estimation skill is a challenge because the bias B depends on unknown (nuisance) parameters b 1 , b 2 which never can be identified from the available observations.We construct such a measure as follows.
For large samples one can efficiently estimate 2 j  , and the bias difference 1 2 from the observations (1) by subtracting the second sample from the first one.Introduce Assume that we deal with a class of estimators  for which the asymptotical bias exists and thereby is a function of all the involved parameters all under the given observations.In such a situation one of the ways to order estimators according to their biases is to accept that   for a certain class of densities   π  where  and is the asymptotical bias of

ˆk
 .In simple words under arbitrary distribution of  the absolute value of the first bias is not greater than that of the second one.
Our first finding and pretty surprising one is that the best estimator in sense (4) among a wide class including traditional statistical and fuzzy estimators is the simple arithmetic mean where j  is any consistent estimator of the center of x i (i.e.Does it mean that (5) is the best way to deal with biased observations?Of course it is not because after all a real matter of concern is SE which can be essentially less for the weighted mean (2) than for (5) under small enough biases j b .The next important result of this study is that the suggested fuzzy estimators are better than weighted estimators of type (2) in sense (4).Finally, to decide which estimator should be prefered in dealing with small samples we carry out Monte Carlo simulations and use SE averaged over the nuisance parameters as a measure of the estimation skill where the angle brackets mean averaging over parameters  defined in (3) and the ratio characterizing the difference in the noise level of two sources.The reason for including the averaging over identifiable parameter  is that we are interested in small samples   for which it is not possible to efficiently estimate even identifiable parameters.We then investigate dependence of ( 6) on the bias level ∆b and noise scale for different estimators and suggest recommendations for sensible choice among them.In general, fuzzy estimators seem to be preferable for high values of ∆b and  in most of scenarios determined by different noise distribution (normal, Cauchy), non-probabilistic noise (logistic chaos), and by different estimates of the center (mean or median).

Estimators and Their Asymptotical Bias
First recall that a fuzzy set is a pair  ,  A P where A is a set and   : 0, P A  1 is called the membership function, the value   P x characterizes the degree of membership For our purposes it is enough to consider real fuzzy sets, A  R .
Regarding to the formulation (1) we consider A as a range of the observed random variable X.Let us introduce a class of fuzzy estimators similar to estimators based on the triangular membership function (possibility distribution) discussed in [9].Let

 
F x be a cumulative distribution function, symmetric and increasing, i.e. , Introduce the membership functions generated by each of two samples as follows s are consistent estimators of the center and spread of the j-th sample respectively, . In other words we assume with probability 1 as .
and the weighted estimator with 1 p  .

We address the following estimator of  henceforth called the fuzzy estimator
To analyze properties of ˆf  , let us fix the bias differ- . Then the bias of ( 9) as where O is the set of the Pareto optimal solutions and is the minimum of a and b.Pareto set is defined by The asymptotical bias for ( 8) is given in the next statement.

Proposition 1
In other words, a Pareto optimal solution (Pareto optimal) is one in which any improvement of one objective function in the two objective optimization problem , then for any fixed a the bias can be achieved only at the expense of another, e.g.[10].where A key point in the base of ( 8) is that we use a standard fuzzy logic aggregation procedure, but integration is carried out only over the set of x maximizing both memberships of x.Such an approach goes far beyond linear estimators traditionally used in statistics.However, the introduced class of estimators allows their analytical study at least regarding to the asymptotical bias. .Since the symmetry the only intersection point of and in this interval is given by where j  is a sample mean or other consistent estimator of the center of j-th sample.
The weights in (9) are given by   and after an appropriate variable change (8) becomes Let us change u to −u in the second integrals on the top and at the bottom.Then using the symmetry of Proceeding to the limit , accounting for the consistency of estimates Notice that in the considered case .Using dimensionless variables after some algebra arrive at (11).Notice that The expression on the right hand side can be written as then the inequality takes form and is equivalent to Let us break down the integral on LHS into integrals from −a to 0 and from 0 to respectively, change v to −v in the second integral, and move it to RHS.Next, make the same variable change on the RHS of ( 14) and move it to LHS.The goal is to have all the integrals over positive intervals.The result is and hence the last inequality is true since   F u is increasing.The proof is over.
Further we restrict ourselves by distributions which decay fast enough as From (15) it follows that     0 lim 0 x q x p x   and since we get Next we consider the class of all estimators for which the asymptotical bias is expressible in form where Notice that both the weighted mean ( 9) and e fuzzy estimator (8) are in this class.For the former it is easy to ch th eck and for the latter the statement follows from (12, 13, 16).Now we intend to order estimators from this class according to their asymptotical bias.Roughly speaking for fixed a and  one estimator is better than another if the asymptotical bias of the former averaged over  is less than that of the latter.Rigorously, for two estimator with biases given by (17)   1, 2 k  the first one is better than the second one if (3) holds true for any positive fu nction   is singled out since it corresp the case nced biases onds to g symmetry of π tain for 0.5 ising result is r A surpr eadily derived from Proposition 3.

Corollary
The trivial estimator is the best estimator in the class (17 This statement follows from the fact that for The problem with (20) is that for small biases it is eshan ŝentially less efficient t and Since the denominator in ( 25) is positiv following statement gives conditions for for the fuzzy r to dominate e for all x the estimato 1  .Proposition 5 if and only if e possibility distribu For example if th tion is given by   which corresponds to Then for any fixed which implies (27).t always fu e triangle bership function defined by the cumulative probability distribution function Condition ( 27) is no lfilled for th mem   where q is a fixed quantile.Indeed, direct computations yield [11].
Proposition 6 In the intermediate case  for membership (2) for all a.However, it can be s [9], that thereby the fuzzy triangle estimator is better than 2  in sense (19). .First define the space of t ters of .In particular, for The range for  is founded in [9].Then introduce the diff e of biases for the mentioned estimators defined by the tegral on the right side is equal r in hand to 0 Obviously the area of equals 2 a hence the first estimator can be viewe better than the second one he area of exceeds 1.This excess is quantified in the next nt.Proposition 7 x     from which the st ent readily follows.The following proposition gives a rough estimate of improving a fuzzy estimator comparing to any weighted m atem ean (9).Set

Corollary
Let for some fixed a rticular, the area of a region in where the bias of the fuzzy estimator (28) is less tha at of n2 4 In pa  n 1  th is greater than 1.125.While the area of a on in regi  w at here the fuzzy triangle estim or is better than 2  is greater than 1733. 1.

Simulations
The goal of the performed simulations is to compare the efficiency of different estimators under fferent noise distributions.
di Now the asymptotical bias is not of primary concern, but rather the relative standard error  and Fuzzy 1.The former is better for small noise while the latter dominates for high noise.Similar conclusions can be drawn from Figures 5 and  6 where other experiments are presented in which the dependence of SE on ∆b were studied for fixed  .We

2 
and ˆ1  .Let us compare the efficiency of (20) to that of 2  in the case of equal small biases triangular membership function (left and middle panel) an r membership (28) (right panel 2 then two options are possible eith   d x or 0   d x changes the sign once (cen tral pane l).Finally

2  and 1 
are much worse in this case.If the center is estimated by the sample median the co e, how n).stri nclusions remain basically the sam ever the numbers are slightly different (experiments are not show In the case of Cauchy di bution of the noises (Figure 3) only the median estimati the centers.One can see that even for zero bias of noises Fuzzy 2 has the smallest SE along with 1 ng  .For modest biases all the estimators except 0  are approximately of the same accuracy for the whole range of  .The primitive 0  appe to be essentially worse than a ars ny other estim ate for high values of  .Finally for high biases again Fuzzy 1 is best for intermediate values of  while 0 and Fuzzy 2 are slightly better for small and large  's respectively.Comparing to the normal noise the accuracy of all the estimators are somewhat lower.In Figure4results are presented for logistic noise generated by r = 5.2 for the first sample and r = 3 for the second one.The results are closer to the normal noise case rather than to the Cauchy case, however errors in general are smaller than in the normal case.In summary for all three experiments, the classical weighted estimator 2  is worsening fast as the level of bias is increasing while the fuzzy estimators demonstrate a steady skill in all the scenarios and for whole range of  and b  .At this point it is hard to give a preference to either Fuzzy 1 or Fuzzy 2.

Figure 2 .
Figure 2. Dependence of standard error on the noise level σ for different values of bias scale ∆b normal noise.The mean is taken as an estimate of center.1) ∆b = 0; 2) ∆b = 0.5; 3) ∆b = 1.

Figure 3 .
Figure 3. Dependence of standard error on the noise level σ for different values of bias scale ∆b Cauchy noise.The median is taken as an estimate of center.1) ∆b = 0; 2) ∆b = 0.5; 3) ∆b = 1.

Figure 4 .
Figure 4. Dependence of standard error on the noise level σ for different values of bias scale ∆b logistic noise.The median is taken as an estimate of center.1) ∆b = 0; 2) ∆b = 0.5; 3) ∆b = 1.