Robust Estimation of the Normal-Distribution Parameters by Use of Structural Partitioning-Perobls D Method

Abstract

Quite many authors have dealt with the estimation of the parameters of normal distribution on the basis of non-homogeneous sets: Hald A. 1949 [1], Arango-Castillo L. and Takahara G. 2018 [2]. All the robust methods are based on the assumption that the results affected by gross errors can be found to the left and/or to the right of censoring, or truncated, points. However, as a rule, the (intrinsic) distribution of observations is complex (mixed) consisting of two or more distributions. Then the existing methods, such as ML, Huber’s, etc., yield enlarged estimates for the normal-distribution variance. By studying better estimates the present author has invented new method, called PEROBLS D, based on the Tukeyan mixed-distribution model in which both the contamination rate (percentage) and the parameters of both distributions, forming the mixed one, are estimated, and for the parameters of the basic normal distribution better estimates are obtained than by the existing methods.

Share and Cite:

Perović, G. (2019) Robust Estimation of the Normal-Distribution Parameters by Use of Structural Partitioning-Perobls D Method. American Journal of Computational Mathematics, 9, 302-316. doi: 10.4236/ajcm.2019.94022.

1. Introduction and History of Robust Estimation

The history of this problem is older than 300 years. For example, Galileo as long ago as in 1632 used the least absolute sum in order to reduce the effect of observational errors to the estimate of the measured quantity [3], whereas Rudjer Boscovich, is the first who, as early as in 1757, rejected clearly outlying observations [4], also done by Daniel Bernouli 1777 [4]. The trimmed mean has been used since long ago, see “Anonymous” 1821 [4]. The first formal rules for rejecting of observations were proposed by Peirce 1852 and Chauvenet 1863, and somewhat later appear the papers by Stone 1868, Wright 1884, Irvin 1925, Student 1927, Thompson 1935, and by many others [4].

The mixed distribution models have been also considered since long ago: Glaisher 1872/1873, Stone 1873, Edgeworth 1883, Newcomb 1886, Jeffreys 1932/1939 [4]. Tukey in 1960 [5] defined a mixed model as a mixture of two normal distributions of a basic Φ [ ( x θ ) / σ ] and of a contaminating Φ [ ( x θ ) / 3 σ ] distribution:

F ( x ) = ( 1 ε ) Φ [ ( x θ ) / σ ] + ε Φ [ ( x θ ) / 3 σ ] ,

where Φ ( x ) = 1 2 π x exp ( t 2 / 2 ) d t is the function of the standard normal

distribution. Here the elements of the contaminating set appear with probability ε , (Tukey assumes ε as a small number, about 5%), and behave as gross errors. In this way the real distributions are represented through a normal-distribution model with weighted tails.

The term robustness began to be used since 1953 (introduced by G. E. P. Box) in order to discriminate the class of statistical procedures with little sensitivity to minor deviations from the starting assumptions. Some authors use the term stability, but it is less used than the term robustness. In the Foreword of his book ROBUST STATISTICS, Huber in 1981 [6] emphasizes that among the leading scientists in the late XIX and the early XX centuries there were a few statisticians—practitioners (and mentions: astronomer Newcomb, astrophysicist Eddington and geophysicist Jeffreys) who expressed in their studies a perfectly clear understanding of the robustness idea. They were aware of the perils caused by long tails of the functions of error distributions, so they proposed models of distribution of gross errors and derived robust variants of standard estimates. Russian geodesists, for example, in their adjustments of the first order triangulation networks allowed lower weights (about half of the original ones) to the observations of directions which do not deviate much.

The initial fundaments to the theory of robust estimation were laid by Swiss mathematician P. J. Huber 1964 [1] and American statistician W. J. Tukey 1960 [5]. Huber’s article “Robust Estimation of Location Parameter” was the first fundament of the theory of robust estimation, which introduced an elastic class of estimates, called M-estimates, which have become a very useful instrument, having established which properties they have (for instance consistency and asymptotic normality). He introduced the model of gross errors, replacing the strict parametric model F ( x θ ) , with its known distribution F, by the mixed model:

H ( x θ ) = ( 1 ε ) F ( x θ ) + ε G ( x θ ) ,

while a part of data ε ( 0 ε < 1 ) may contain gross errors which have an arbitrary (unknown) distribution G ( x θ ) .

The causes of deviating from the parametric models are various and four main types of deviating from the strict parametric models can be distinguished:

1) Appearance of gross errors;

2) Rounding and grouping;

3) Using an approximate functional mode; and

4) Using an approximate stochastic model.

The fundamentals of robust methods were developed in the last century. Today there are numerous applications of robust methods, and concurrently better and more detailed solutions are sought: [2] [7] - [14].

Due to good properties of the robust methods—that it is possible to eliminate or decrease the influences of gross errors and outliers on the estimates of distribution parameters, in practice they are used more and more. Therefore, the same time, their development results. So the current development and application of the robust methods may be classified into the following groups:

➢ Improvement of existing methods, such as “A new Perspective on Robust M. Estimation” [13];

➢ Solving of delicate (specific) tasks:—for robust hybrid state estimation with unknown measurement noise statistics [14] —for optimal allocation of shares in a financial portfolio [8] —for robust estimation of 3D human poses from a single image [12] —for cubature Kalman filter for dynamic state estimation of synchronous machines under unknown measurement noise statistics [9];

➢ Applications in various conditions:—for robust estimation of the sample mean variance for Gaussian processes with long-range dependence [2] —for robust estimation of 3D human poses from a single image [12] —for robust hybrid state estimation with unknown measurement noise [14] —for estimation of mean and variance using environmental data sets with below detection limit observations [10];

➢ Applications in diverse fields:—for estimation of the sample mean variance for Gaussian processes with long-range dependence [2] —for Gaussian sum filtering with unknown noise statistics: Application to target tracking [11] —robust cubature Kalman filter for dynamic state estimation of synchronous machines under unknown measurement noise statistics [9] —for estimation of mean and variance in Fisheries [7] —for optimal allocation of shares in a financial portfolio [8] —for estimation of mean and variance using environmental data sets with below detection limit observations [10].

The proposed PEROBLS D method is aimed at eliminating the influences of gross errors and outliers on the estimates of distribution parameters, when only one contaminating distribution is present, i.e. in the case of Tukey’s mixed distribution.

The key difference between this paper and existing studies is that the PEROBLS D method in the estimating procedure uses no distribution censoring, unlike the existing methods, but instead a structural decomposition into two distributions is used—basic and contaminating ones which have the same mean value and then the parameters of both distributions are estimated.

Consequently, in the case that both distributions (basic and contaminating) are normal, the PEROBLS D method has the following properties:

1) Unbiased (exact) parameter estimates for the basic distribution, as the most important property;

2) Unbiased (exact) parameter estimates for the contaminating distribution;

3) Percentage estimates for fractions of basic and contaminating distributions in the mixed one.

The correctness of the method has been verified on exact (expected) values of some quantities from the mixed Tukeyan distribution, as well as on an example of simulated data for 200 measurements of one quantity.

Besides, the estimates of the mean and variance for the basic distribution have been compared with the same ones obtained by ML method, and the estimate basic distribution standard has been also compared with Tukey mad standard estimate.

As has been said, the PEROBLS D estimates are unbiased, whereas the estimates of the basic distriburion standard in both cases, according to ML and Tukey mad, are increased.

The structure of the further presentation is the following. At first definitions and notations are given, then basis of PEROBLS D method and the way of solving the formulated problem. Afterwards the existing robust estimation methods—ML and Huber’s mad—are presented. Further on the PEROBLS D method is verified on examples and the solutions are compared with existing ones. Finally, there are conclusions and references.

2. Definitions and Notations

The density function for a standard normal variable Z is given as

f ( z ) = 1 2 π exp ( z 2 / 2 ) , < z < .

Let F ( z ) be the notation of the distribution function for Z. The quintiles of Z will be denoted as z 1 α , where α is the significance level, z α = z 1 α and

p ( Z z 1 α ) = F ( z 1 α ) = 1 α .

A normally distributed r. v. X with expectation μ and variance σ 2 has the density and distribution functions, respectively:

f ( z ) = f ( z ) / σ and F ( x ) = x f ( x ) d x = x f ( z ) d z = F ( z ) , with

z = ( x μ ) / σ . (1)

Let as consider a random variable (natural sequence of measurements) [15] : X 1 , X 2 , , X n , from a normal population and use N ( μ , σ 2 ) , with mean μ and standard deviation σ , (i.e. variance σ 2 ) where one assumes that the observations X 1 , X 2 , , X n are mutually independent. Arranging them in the ascending order of magnitude one obtains order statistics X ( i ) , i = 1 , 2 , , n (Figure 1), where the points A and B defined in the following way:

A = X ( n X + 1 ) 0.5 d and B = X ( n n Z ) + 0.5 d , (2)

where d is the width of the rounding interval for observations X.

With z from Equation (1), it will be analogously:

Z ( 1 ) Z ( 2 ) Z ( n X ) z A Z ( n X + 1 ) Z ( n n Z ) z B Z ( n n Z + 1 ) Z ( n ) ,

Z ( n X ) < ( z A = A μ σ ) Z ( n X + 1 ) ; Z ( n n Z ) ( z B = B μ σ ) < Z ( n n Z + 1 ) ,

where z ( i ) = ( x ( i ) μ ) / σ .

3. Basis of PEROBLS D Method1

The idea of PEROBLS D method has been presented in the Least Squares book [16].

Instead of assuming the presence of gross errors in the observations within X and Z regions, used in the previous methods, in this method the observation distribution is defined by means of Tukeyan mixed distribution (Figure 2):

F ( x ) = ( 1 ε ) F 1 ( x ) + ε F 2 ( x ) , (3)

where F 1 ( x ) is the basic, F 2 ( x ) —the contaminating one, whereas 0 < ε < 0.5 , noting that ε cannot exceed 0.5, because, in this case F 2 ( x ) must be taken as the basic distribution. (In geodetic applications there is mostly 0 < ε < 0.3 ).

In this method the points A and B are partition points only, i. e. they are neither truncation points nor censoring ones. They are chosen so that in the domains X and Z the contaminating distribution prevails—which is one of the prerequisites to find a good (satisfactory) solution of the problem (task).

Note 1. In geodetic measurements distributions close to the Tukeyan ones are frequent. ▲

The designations concerning the basic and the contaminating distributions are given in Table 1.

The task is to estimate the parameters of both distributions, of basic and contaminating ones.

4. PEROBLS D Solution

The parameter estimators for both distributions will be derived from the maximal probability of the event:

{ ( D ( 1 ) D ( n X ) ) ( z z A ) ( z A z z B ) ( D ( n X + 1 ) D ( n n Z ) ) ( z > z B ) ( D ( n n Z + 1 ) D ( n ) ) } ,

where D ( 1 ) = ( X ( 1 ) X X ( 1 ) + d x ) , D ( 1 ) = ( X ( 1 ) X X ( 1 ) + d x ) , , etc.

Figure 1. Partition points, A and B.

Figure 2. Tukeyan mixed distribution of normally distributed observations.

Table 1. Designations and terms of the quantities appearing in the basic and the contaminating distributions.

Let

z = x μ σ 1 , z A = A μ σ 1 , z B = B μ σ 1 z = x μ σ 2 , z A = A μ σ 2 , z B = B μ σ 2 } (4)

,.

Then the likelihood function, up to the proportionality constant k, is:

where X, Y and Z under product sign mean:, etc., and where:

, ,.

If we also introduce the notations

then the conditions, and yield the equations:

(5)

solvable only iteratively. There are many methods; here direct iterations are given.

However, within system (5), except, and, the sums, , , and, and the numbers and are also

unknown and they should be previously determined.

For the purpose of determining and there are many ways. The present author has examined a few methods out of which he has adopted the least-square one. With three relationships:

(6)

where and are the corrections to the “observations” and n, , first using LS with assuming the “observation” weights: and, we find the LS estimates for and:

, , with

and then

and with.

In this way the condition is satisfied, but the conditions: , and are not satisfied. However, since all conditions in Equations (6) cannot be satisfied simultaneously, a compromise yielding a solution close to the optimum must be accepted.

The sums, , etc., can be also solved in various ways, but the present author has chosen the following one. At first we find the sums:

,

,

,

,

and then by means the asymptotic theory according to which:

,

,

etc., it follows:

,

,

one introduces the substitutions:

where:

Using these results the solution of system (5) we have:

(7)

(8)

(9)

where:, ,.

Let be the vector of parameter estimates in the k-th iteration and d the difference vector of these estimates from (k+1)-th and k-th iterations, then the iterations should be stopped if

The points of optimal partition, and, with, are found from the condition, where and. So one obtains:

(10)

The advantages of the method are:

1) Unbiased estimators for, and, if assumptions (4) hold and A and B are close to and; and

2) Minimal variances for,.

The disadvantages of the method are:

1) A high sensitivity to the choice of the points A and B, (points A and B must be close to and), which can result in negative estimates for either of the variances or, or for both; and

2) Sensitivity to the choice of the initial values for the variances and, which, also, can result in negative estimates for one or both variances.

Therefore, the method is recommendable for applications comprising a high number of observations, (for example).

Note 2. If there exists the basic distribution only (when in Equation (3)), the method will yield either or negative values for one or both variances.

5. Some Robust Estimation

Out of many robust LS methods we shall use here two of them: the method of Maximum Likelihood (ML) and Huber's mad estimation of distribution standard [6] [17].

5.1. The Maximum Likelihood (ML) Method

The ML method is based on the assumption that in the domain Y there exists only the basic distribution, unlike the domaines X and Z where in addition to the basic distribution there exist gross errors and outliers. Here the censoring points are A and B and they are defined according to Equation (2).

In the region (), due to the presence of gross errors in the observations, the distribution of the random variable X is not normal. Therefore, the estimates of the parameters, and are determined on the basis of the probability of the event:

,

where the events, , mean that the random variable X is within an interval (with differentially small).

Then the likelihood function, up to the proportionality constant, is [18] :

,

and the ML estimators are the solutions of the equations

,., and. (11)

Equations (11) have no analytical solution and they must be solved iteratively. There are several methods; here direct iterations are given:

, (12)

, (13)

where:

As initial values we can adopt, and.

In the present author’s opinion under the preposition of existing of contaminating distribution the method yields increased estimates for.

5.2. Huber’s Mad Robust Estimation of Distribution Standard

For the purpose of estimating an unknown standard Huber 1981 [6] and Birch and Mayers 1982 [17] proposed a median estimator for:

,. (14)

6. Results and Discussion

Example 1. For the sake of verifying the correctness of PEROBLS D method and examining the appropriateness of Maximum Likelihood (ML) method in Table 2 are presented the exact (expected) values of some quantities from the mixed Tukeyan distribution (3), with, , , , and symmetrical partitioning (). The numbers, and and the other quantities in the table are presentd for the case .

According to data in Table 2, for two methods—PEROBLS D and ML—the estimates for the corresponding quintiles are calculated and presented in Table 3. The results of estimating the variance of the basic distribution by using ML methods indicate their appropriateness.

Example 2. Simulated Data. Using normally distributed N(0, 1) random numbers from Tables of Bol’shev and Smirnov 1968 [19] the mixed Tukeyan distribution (3) is found (Table 4), with normal distributions: basic with and contaminating with, (); with estimates:

basic:, ,

contaminating:,.

According to Equation (10), for and, the optimal partition points are:

for which one obtains 6.5 percent partition with and.

Table 2. The exact (expected) values of some quantities from the mixed Tukeyan distribution (3), with, , , , and symmetrical partitioning ().

Table 3. The results of Estimating the Normal-Distribution Parameters by using PEROBLS D, and ML methods according to the exact (expected) values of the corresponding sums given in Table 2.

Table 4. The 200 () simulated observations of the Tukeyan mixed distribution (3), with, , and; (,).

In Table 5, for various choices of the partition points A and B, a survey of the parameter estimates for the basic and contaminating distributions obtained by different methods is given. The parameter estimates in the PEROBLS D method are close to the exact ones, whereas in the case of application of the ML and Huber-mad methods the variance of the basic distribution is overestimated.

The best way is to choose the partition points A and B for the PEROBLS D method from the frequency histogram (see Figure 3) by accepting the x values for which it, to the left and right of the distribution centre, begins to have values above the smoothing curve for the normal distribution.

Recommendation. The estimate of standard, for the purpose of drawing the smoothing curve can be calculated according to the standard formula, where 2% - 5% rejected outlying observations not taken into account. ▲

Table 5. The parameter estimates of a normal distribution based on 200 simulated observations of the Tukeyan mixed distribution with, , , and,.

Figure 3. Frequency Histogram for 200 simulated observations of the Tukeyan mixed distribution (3) with, , and.

7. Conclusions

On the basis of the obtained results in Examples 1 and 2 we can conclude the following:

1) On the basis of exact (expected) values from Example 1 the validity of the PEROBLS D method in the parameter estimation (expectation and variance) for both distributions in the Tukeyian mixed distribution of observations is confirmed. Here the variance estimates for both distributions, basic and contaminating ones, are correct, i.e. their values are exact.

2) On the basis of simulated realistic measurements from Example 2 good (satisfactory) parameter estimates for both distributions are also confirmed.

Acknowledgements

Dr. ZORICA Cvetković from the Astronomical Observatory in Belgrade is the author of the FORTRAN programmes who performed the calculations in the examples. The ALHIDADA D.O.O. Company from Petrijevci (Croatia) has financially supported the publishing of the article. The figures were made by Darko Andjic, Dr. and graduated engineer of Geodesy from the Republic Geodetic Authority, Podgorica, Montenegro.

NOTES

1PEROBLS D is an abbreviation of the initial letters: Perović’s Robust Least-Square Method; D—by distribution Decomposing.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Huber, P.J. (1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35, 73-101.
https://doi.org/10.1214/aoms/1177703732
[2] Arango-Castillo, L. and Takahara, G. (2018) Robust Estimation of the Sample Mean Variance for Gaussian Processes with Long-Range Dependence. 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, 14-16 November 2017, 201-205.
https://doi.org/10.1109/GlobalSIP.2017.8308632
[3] Hald, A. (1986) Galileo’s Statistical Analysis of Astronomical Observations. International Statistical Review, 54, 211-220.
https://doi.org/10.2307/1403145
[4] Hampel, F., Roncetti, E., Rausseew, P.J. and Stael, V. (1986) Robust Statistics; The Approach Based on Influence Functions. John Wiley, New York (Russian translation, Mir, Moscow, 1989).
[5] Tukey, J.W. (1960) A Survey of Sampling from Contaminated Distributions. In: Oklin, I., Ed., Contributions to Probability and Statistics, Stanford University Press, Redwood City, CA.
[6] Huber, P.J. (1981) Robust Statistics. John Wiley and Sons, New York.
https://doi.org/10.1002/0471725250
[7] Chen, Y. and Jackson, D. (1995) Robust Estimation of Mean and Variance in Fisheries. Transactions of the American Fisheries Society, 124, 401-412.
https://doi.org/10.1577/1548-8659(1995)124<0401:REOMAV>2.3.CO;2
[8] Grossi, L. and Laurini, F. (2011) Robust Estimation of Efficient Mean-Variance Frontiers. Advances in Data Analysis and Classification, 5, 3-22.
https://doi.org/10.1007/s11634-010-0082-3
[9] Li, Y., Li, J., Qi, J. and Chen, L. (2019) Robust Cubature Kalman Filter for Dynamic State Estimation of Synchronous Machines Under Unknown Measurement Noise Statistics. IEEE Access, 7, 29139-29148.
https://doi.org/10.1109/ACCESS.2019.2900228
[10] Singh, A. and Nocerino, J. (2002) Robust Estimation of Mean and Variance Using Environmental Data Sets with Below Detection Limit Observations. Chemometrics and Intelligent Laboratory Systems, 60, 69-86.
https://doi.org/10.1016/S0169-7439(01)00186-1
[11] Vilà-Valls, J., Wei, Q., Closas, P. and Fernández-Prades, C. (2014) Robust Gaussian Sum Filtering with Unknown Noise statistics: Application to TargetTracking. 2014 IEEE Workshop on Statistical Signal Processing, Gold Coast, Australia, 29 June-2 July 2014, 416-419.
https://doi.org/10.1109/SSP.2014.6884664
[12] Wang, Ch., Wang, Y., Lin, Z., Yuille, A.L. and Gao, W. (2014) Robust Estimation of 3D Human Poses from a Single Image. CBMM Memo No. 013.
https://cbmm.mit.edu/sites/default/files/publications/CBMM-Memo-013.pdf
https://doi.org/10.1109/CVPR.2014.303
[13] Zhou, W.-Z., Bose, K., Fan, J. and Liu, H. (2018) A New Perspective on Robust M-Estimation: Finite Sample Theory and Applications to Ddependence-Adjusted Multiple Testing. The Annals of Statistics, 46, 1904-1931.
https://doi.org/10.1214/17-AOS1606
[14] Zhao, J. and Mili, L. (2018) A Framework for Robust Hybrid State Estimation with Unknown Measurement Noise Statistics. IEEE Transactions on Industrial Informatics, 14, 1866-1875.
https://doi.org/10.1109/TII.2017.2764800
[15] Perović, G. (1989) Adjustment of Calculus, Book I Theory of Measurement Errors, 2nd Revised and Enlarged Edition, Naučna knjiga, Belgrade (In Serbo-Croat).
[16] Perović, G. (2005) Least Squares. Publisher: Author, Belgrade.
[17] Birch, J.B. and Myers, R.H. (1982) Robust Analysis of Covariance. Biometrics, 38, 699-713.
https://doi.org/10.2307/2530050
[18] Schneider, H. (1986) Truncated and Censored Samples from Normal Populations. Marcel Dekker, Inc., New York and Basel.
[19] Bol’shev, L.N. and Smirnov, N.V. (1968) Tables of Mathematical Statistics. VTS AN SSSR, Moscow (In Russian).

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.