Robust Estimation of the Normal-Distribution Parameters by Use of Structural Partitioning-Perobls D Method

Quite many authors have dealt with the estimation of the parameters of normal distribution on the basis of non-homogeneous sets: Hald A. 1949 [1], Arango-Castillo L. and Takahara G. 2018 [2]. All the robust methods are based on the assumption that the results affected by gross errors can be found to the left and/or to the right of censoring, or truncated, points. However, as a rule, the (intrinsic) distribution of observations is complex (mixed) consisting of two or more distributions. Then the existing methods, such as ML, Huber’s, etc., yield enlarged estimates for the normal-distribution variance. By study-ing better estimates the present author has invented new method, called PEROBLS D, based on the Tukeyan mixed-distribution model in which both the contamination rate (percentage) and the parameters of both distributions, forming the mixed one, are estimated, and for the parameters of the basic normal distribution better estimates are obtained than by the existing methods.


Introduction and History of Robust Estimation
The history of this problem is older than 300 years. For example, Galileo as long ago as in 1632 used the least absolute sum in order to reduce the effect of observational errors to the estimate of the measured quantity [3], whereas Rudjer Boscovich, is the first who, as early as in 1757, rejected clearly outlying observations [4], also done by Daniel Bernouli 1777 [4]. The trimmed mean has been ( ) is the function of the standard normal distribution. Here the elements of the contaminating set appear with probability ε , (Tukey assumes ε as a small number, about 5%), and behave as gross errors. In this way the real distributions are represented through a normal-distribution model with weighted tails.
The term robustness began to be used since 1953 (introduced by G. E. P. Box) in order to discriminate the class of statistical procedures with little sensitivity to minor deviations from the starting assumptions. Some authors use the term stability, but it is less used than the term robustness. In the Foreword of his book ROBUST STATISTICS, Huber in 1981 [6] emphasizes that among the leading scientists in the late XIX and the early XX centuries there were a few statisticians The initial fundaments to the theory of robust estimation were laid by Swiss mathematician P. J.
while a part of data ε ( 0 1 ε ≤ < ) may contain gross errors which have an arbitrary (unknown) distribution

( )
The causes of deviating from the parametric models are various and four main types of deviating from the strict parametric models can be distinguished: 1) Appearance of gross errors; 2) Rounding and grouping; 3) Using an approximate functional mode; and 4) Using an approximate stochastic model.
The fundamentals of robust methods were developed in the last century. Today there are numerous applications of robust methods, and concurrently better and more detailed solutions are sought: [2] [7]- [14].
Due to good properties of the robust methods-that it is possible to eliminate or decrease the influences of gross errors and outliers on the estimates of distribution parameters, in practice they are used more and more. Therefore, the same Applications in diverse fields:-for estimation of the sample mean variance for Gaussian processes with long-range dependence [2]-for Gaussian sum filtering with unknown noise statistics: Application to target tracking [11] -robust cubature Kalman filter for dynamic state estimation of synchronous machines under unknown measurement noise statistics [9]-for estimation of mean and variance in Fisheries [7]-for optimal allocation of shares in a financial portfolio [8]-for estimation of mean and variance using environmental data sets with below detection limit observations [10].
The proposed PEROBLS D method is aimed at eliminating the influences of gross errors and outliers on the estimates of distribution parameters, when only one contaminating distribution is present, i.e. in the case of Tukey's mixed distribution.
The key difference between this paper and existing studies is that the PEROBLS D method in the estimating procedure uses no distribution censoring, unlike the existing methods, but instead a structural decomposition into two distributions is used-basic and contaminating ones which have the same mean 2) Unbiased (exact) parameter estimates for the contaminating distribution; 3) Percentage estimates for fractions of basic and contaminating distributions in the mixed one.
The correctness of the method has been verified on exact (expected) values of some quantities from the mixed Tukeyan distribution, as well as on an example of simulated data for 200 measurements of one quantity.
Besides, the estimates of the mean and variance for the basic distribution have been compared with the same ones obtained by ML method, and the estimate basic distribution standard has been also compared with Tukey mad standard estimate.
As has been said, the PEROBLS D estimates are unbiased, whereas the esti-

Definitions and Notations
The density function for a standard normal variable Z is given as Let ( ) F z be the notation of the distribution function for Z. The quintiles of Z will be denoted as 1 z α − , where α is the significance level, A normally distributed r. v. X with expectation µ and variance 2 σ has the density and distribution functions, respectively: Let as consider a random variable (natural sequence of measurements) [15]: 1 2 , , , n X X X  , from a normal population and use ( ) 2 , N µ σ , with mean µ and standard deviation σ , (i.e. variance 2 σ ) where one assumes that the observations 1 2 , , , n X X X  are mutually independent. Arranging them in the ascending order of magnitude one obtains order statistics ( ) , 1, 2, , Figure 1), where the points A and B defined in the following way: where d is the width of the rounding interval for observations X. With z from Equation (1), it will be analogously:

Basis of PEROBLS D Method 1
The idea of PEROBLS D method has been presented in the Least Squares book [16].
Instead of assuming the presence of gross errors in the observations within X and Z regions, used in the previous methods, in this method the observation distribution is defined by means of Tukeyan mixed distribution ( Figure 2): In this method the points A and B are partition points only, i. e. they are neither truncation points nor censoring ones. They are chosen so that in the domains X and Z the contaminating distribution prevails-which is one of the prerequisites to find a good (satisfactory) solution of the problem (task). Note 1. In geodetic measurements distributions close to the Tukeyan ones are frequent. ▲ The designations concerning the basic and the contaminating distributions are given in Table 1.
The task is to estimate the parameters of both distributions, of basic and contaminating ones.

PEROBLS D Solution
The parameter estimators for both distributions will be derived from the maximal probability of the event: Then the likelihood function, up to the proportionality constant k, is: If we also introduce the notations yield the equations:  In this way the condition n n n ′ ′′ + = is satisfied, but the conditions: ∑ , etc., can be also solved in various ways, but the present author has chosen the following one. At first we find the sums: and then by means the asymptotic theory according to which: etc., it follows: one introduces the substitutions: Using these results the solution of system (5) we have: be the vector of parameter estimates in the k-th iteration and d the difference vector of these estimates from (k+1)-th and k-th

Some Robust Estimation
Out of many robust LS methods we shall use here two of them: the method of Maximum Likelihood (ML) and Huber's mad estimation of distribution standard [6] [17].

The Maximum Likelihood (ML) Method
The ML method is based on the assumption that in the domain Y there exists only the basic distribution, unlike the domaines X and Z where in addition to the basic distribution there exist gross errors and outliers. Here the censoring points are A and B and they are defined according to Equation (2).
In the region ( X Z ∪ ), due to the presence of gross errors in the observations, the distribution of the random variable X is not normal. Therefore, the estimates of the parameters µ , and 2 σ are determined on the basis of the probability of the event: where the events Then the likelihood function, up to the proportionality constant, is [18]: and the ML estimators are the solutions of the equations Equations (11) have no analytical solution and they must be solved iteratively.
There are several methods; here direct iterations are given: where: As initial values we can adopt 0 X µ = , and 2 2 0 m σ = .
In the present author's opinion under the preposition of existing of contaminating distribution the method yields increased estimates for 2 σ .

Huber's Mad Robust Estimation of Distribution Standard
For the purpose of estimating an unknown standard σ Huber 1981 [6] and Birch and Mayers 1982 [17] proposed a median estimator for σ :

Results and Discussion
Example 1. For the sake of verifying the correctness of PEROBLS D method and examining the appropriateness of Maximum Likelihood (ML) method in Table  2 are presented the exact (expected) values of some quantities from the mixed Tukeyan distribution (3) According to data in Table 2, for two methods-PEROBLS D and ML-the estimates for the corresponding quintiles are calculated and presented in Table 3.

Conclusions
On the basis of the obtained results in Examples 1 and 2 we can conclude the following: 1) On the basis of exact (expected) values from Example 1 the validity of the PEROBLS D method in the parameter estimation (expectation and variance) for both distributions in the Tukeyian mixed distribution of observations is confirmed. Here the variance estimates for both distributions, basic and contaminating ones, are correct, i.e. their values are exact.
2) On the basis of simulated realistic measurements from Example 2 good (satisfactory) parameter estimates for both distributions are also confirmed.