_{1}

^{*}

A mixed distribution of empirical variances, composed of two distributions the basic and contaminating ones, and referred to as PERG mixed distribution of empirical variances, is considered. In the paper a robust inverse problem solution is given, namely a (new) robust method for estimation of variances of both distributions—PEROBVC Method, as well as the estimates for the numbers of observations for both distributions and, in this way also the estimate of contamination degree.

A detailed presentation of Variance and Covariance Components Models can be found in Rao and Kleffe [

However, in the applications deviations appear from the assumptions under which these solutions are obtained, which also leads to negative estimates of variance components. The main source of these deviations is the contaminating distribution of observations (also presence of outliers), which, as a rule, always appears. Therefore, Variance Components (VC) solutions are always studied, which are less sensitive or insensitive to these deviations. Such properties are characteristic for robust solutions, out of which the following ones are here presented:

Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems [

Robust estimates of variances and of their components have been continually applied in various fields [

Regarding a GPS measurement error as a random process and modeling the process realization structurally as a 2-way hierarchical classification with random effects, Perović [

From the said above it is seen that robust methods for estimation of Variance Components are studied. However, studies of Variance Components estimates on the basis of empirical variances distributions, except the standard procedure, have not been undertaken. For this reason the present author has devoted himself to VC research on the basis of these distributions, so that the here presented method was invented.

In the present paper a mixed distribution composed of two distributions of empirical variances is considered, where one of them is basic and the other one is contaminating, with an essential assumption that the variances of both distributions are obtained on the basis of primary normal distributions, as well as that they have the same degrees of freedom. The present author names this distribution PERG mixed-distribution of empirical variances (PERG—from initial letters of the author’s names: Perović Gligorije), which is presented in

F ( s 2 ) = ( 1 − ε ) F 1 ( s 2 ) + ε F 2 ( s 2 ) , ( 0 < ε < 0.5 ) (1)

^{1}PEROBVC is an abbreviation of the initial letters: Perović’s Robust Variance Components Estimation Model.

where F_{1} is the basic distribution, F_{2} the contaminating distribution, both of empirical variances s^{2} with f d.f. and ε (0 < ε < 0.5) the contaminating degree.

On the basis of such a composition of the mixed distribution of empirical variances the author obtained a robust solution named Perović’s Robust Variance Components Estimation Model—PEROBVC^{1} method for variances and for observation numbers for both distributions. In this way the method eliminates the influences of gross errors and outliers on the estimates of distribution parameters.

The key difference between the PEROBVC method and the existing ones is that distribution censoring is not used, but instead on the basis of structural decomposition in two distributions, except obtaining parameter estimates of the two distributions, the question of outliers is simultaneously solved.

Consequently, the PEROBVC method has the following properties:

1) Unbiased (exact) variance estimates for the basic distribution, as the most important property,

2) Unbiased (exact) variance estimates for the contaminating distribution,

3) Estimates of observation numbers for both distributions and in this way percentage estimates concerning fractions of basic and contaminating distributions in the mixed one.

The correctness of the PEROBVC method has been verified on exact (expected) values of some quantities from the mixed PERG distribution, as well as on examples of measurements in 2D control geodetic networks.

The variance estimate for the basic distribution obtained by applying the PEROBVC method is compared to its ML estimate.

The structure of the further presentation is the following: 1) establishing the PEROBVC method, 2) way of solving the formulated problem, 3) presentation of robust ML estimation method, 4) correctness verification for PEROBVC method by comparing its solution to the ML one and 5) verification on examples of geodetic measurements in to 2D control networks.

The idea of PEROBVC method has been presented in [

Consider a PERG mixed distribution (

F ( s 2 ) = ( 1 − ε ) F ( s ′ 2 ) + ε F ( s ″ 2 ) , ( 0 < ε < 0.5 ) (2)

with n ′ + n ″ = n and with the same degrees of freedom, f ′ i = f ″ j ≡ f . The variances are obtained from primary normally distributed observations with expectations

E [ s ′ i 2 ] = σ 1 2 , i = 1 , 2 , ⋯ , n ′ , E [ s ″ j 2 ] = σ 2 2 , j = 1 , 2 , ⋯ , n ″ ; σ 2 2 > σ 1 2 ,

where F ( s ′ 2 ) and F ( s ″ 2 ) are the distribution functions of the empirical variances s ′ 2 and s ″ 2 . The task is to estimate the variances σ 1 2 and σ 2 2 of both distributions on the basis of the mixed distribution with f known. In addition to the solving of this task, the method also yields estimates of the numbers of observations n ′ and n ″ for both component distributions F ( s ′ 2 ) and F ( s ″ 2 ) , i.e. the contaminating degree is estimated.

The point B divides the domain of s 2 into two sub-domains X and Y (

n ′ _{}and n ″ —numbers of all observations, ( s ′ 2 and s ″ 2 ), within X and Y;

n ′ X _{}and n ″ X —numbers of observations for s ′ 2 and s ″ 2 within X;

n ′ Y and n ″ Y —numbers of observations for s ′ 2 and s ″ 2 within Y,

n X + n Y = n n ′ X + n ″ X = n X n ′ + n ″ = n n ′ Y + n ″ Y = n Y } , (3)

The point B will be chosen in such a way that within the interval (0, B) the observations s ′ 2 dominate, whereas within [B, ¥) the observations s ″ 2 dominate from where it follow that it must be B = B_{opt} (also see

Here the point B is defined according as

B = s ( n X ) 2 + d / 2 , (4)

where d is the width of the rounding interval for observations s^{2}.

For the sake of simplifying the following designations are introduced:

( a ) w = s 2 with f d .f . ⇒ w ′ = s ′ 2 , w ″ = s ″ 2 ( b ) y = f s 2 σ 2 = χ f 2 ⇒ y ′ = f s ′ 2 σ 1 2 , y ″ = f s ″ 2 σ 2 2 } (5)

Then the density of the random variable_{ s 2 }-distribution can be written as

f ( s 2 ) = f ( w ) = k f − 1 σ − f f f / 2 w f / 2 − 1 e − w f / 2 σ 2 = f σ 2 f ( y ) , k f = 2 f / 2 Γ ( f 2 ) (6)

where Γ ( f / 2 ) is the gamma function of f/2. Then

F ( s 2 ) = F ( w ) = ∫ 0 w f ( w ) d w = ∫ 0 y f ( y ) d y = F ( y ) (7)

w f ( w ) d w = σ 2 f y f ( y ) d y ⇒ ∫ 0 w w f ( w ) d w = σ 2 f ∫ 0 y y f ( y ) d y . (8)

From the equality condition for the ordinates of the two distribution densities, i.e. from the condition_{opt})

B opt = 2 f ln ( n ′ n ″ ( σ 2 σ 1 ) f ) 1 σ 1 2 − 1 σ 2 2 . (9)

The estimates of the variance components for both distributions are obtained from the event-probability maximum

[ D ′ ( 1 ) ∧ ⋯ ∧ D ′ ( n ′ X ) ] ∧ ( s ″ 2 < B ) ∧ ( s ′ 2 > B ) ∧ [ D ″ ( n ″ X + 1 ) ∧ ⋯ ∧ D ″ ( n ″ ) ]

where D ′ ( 1 ) = s ′ ( 1 ) 2 < s ′ 2 < s ′ ( 1 ) 2 + d s ′ 2 = w ′ ( 1 ) < w ′ < w ′ ( 1 ) + d w ′ , ⋯ , D ″ ( n ″ ) = s ″ ( n ″ ) 2 < s ″ 2 < s ″ ( n ″ ) 2 + d s ″ 2 = w ″ ( n ″ ) < w ″ < w ″ ( n ″ ) + d w ″ , whose likelihood function, to the proportionality constant k, is

L = k ⋅ ∏ X f ( w ′ ( i ) ) ⋅ p ( w ″ < B ) ⋅ p ( w ′ > B ) ⋅ ∏ Y f ( w ″ ( i ) ) ,

where: ∏ X f ( w ′ ( i ) ) = ∏ 1 n ′ X f ( w ′ ( i ) ) ,

In view of (5)-(8) one finds:

f ′ B = k f − 1 y ′ B f / 2 − 1 e − y ′ B / 2 f ″ B = k f − 1 y ″ B f / 2 − 1 e − y ″ B / 2 y ′ B = f ⋅ B σ 1 2 y ″ B = f ⋅ B σ 2 2 F ′ B = ∫ 0 y ′ B f ( y ′ ) d y ′ F ″ B = ∫ 0 y ″ B f ( y ″ ) d y ″ A ′ B = ∫ 0 y ′ B 1 f y ′ f ( y ′ ) d y ′ A ″ B = ∫ 0 y ″ B 1 f y ″ f ( y ″ ) d y ″ } . (10)

Furthermore lnL will be used there is:

ln L = c + ∑ X ln f ( w ′ ( i ) ) + ln ( F ″ B ) n ″ X + ln ( 1 − F ′ B ) n ′ Y + ∑ Y ln f ( w ″ ( i ) ) , (11)

where c is a constant.

According to the asymptotic theory it is:

∑ X w ″ i → n ″ → ∞ p n ″ X ∫ 0 B w ″ f ( w ″ ) d w ″ , (12a)

∑ Y w ′ i → n ′ → ∞ p n ′ Y ∫ 0 B w ′ f ( w ′ ) d w ′ = n ′ Y σ 1 2 ( 1 − A ′ B ) . (12b)

Bearing in mind (10)-(12) and introducing the substitutions

( a ) ∑ X s ′ i 2 = ∑ X s i 2 − ∑ X s ″ i 2 , ∑ Y s ″ i 2 = ∑ Y s i 2 − ∑ Y s ′ i 2 ( b ) ∑ X s ″ i 2 = n ″ ∫ X s ″ 2 f ( s ″ 2 ) d s ″ 2 = n ″ σ 2 2 A ″ B ( c ) ∑ Y s ′ i 2 = n ′ ∫ Y s ′ 2 f ( s ′ 2 ) d s ′ 2 = n ′ σ 1 2 ( 1 − A ′ B ) }

from the necessary maximum condition: (a) ∂ ln L / ∂ σ 1 2 = 0 and (b) ∂ ln L / ∂ σ 2 2 = 0 we get the equations

( a ) − n ′ X 2 σ 1 2 + 1 2 σ 1 4 ∑ X s i 2 − n ″ X σ 2 2 2 σ 1 4 A ″ B + B n ′ Y f ′ B σ 1 4 ( 1 − F ′ B ) = 0 ( b ) − n ″ Y 2 σ 2 2 + 1 2 σ 2 4 ∑ Y s i 2 − B n ″ X f ″ B σ 2 4 F ″ B = 0 }

Their solution is given as simple iterations

σ 1 , k + 1 2 = 1 n ′ X , k [ ∑ X s i 2 − n ″ k σ 2 , k 2 A ″ B , k + 2 B n ′ Y , k f ′ B , k 1 − F ′ B , k ] , (13a)

σ 2 , k + 1 2 = 1 n ″ Y , k [ ∑ Y s i 2 − n ′ k σ 1 , k 2 ( 1 − A ′ B , k ) − 2 B n ″ X , k f ″ B , k F ″ B , k ] . (13b)

where the numbers n ′ X , n ′ Y , n ″ X and n ″ Y are determined from the expressions

( a ) n ′ X = n ′ F ′ B , n ′ Y = n ′ − n ′ X ( b ) n ″ X = n ″ F ″ B , n ″ Y = n ″ − n ″ X } , (14)

whereas n ′ _{ }and n ″ are determined as

F ′ B n ′ + F ″ B n ″ = n X ( 1 − F ′ B ) n ′ + ( 1 − F ″ B ) n ″ = n Y } (15)

Note 1. The initial values for the variance components can be assumed, for example, as:

σ 1 , 0 2 = 0.6 ⋅ ( 1 n ∑ 1 n s i 2 ) and σ 2 , 0 2 = 3 ⋅ ( 1 n ∑ 1 n s i 2 ) .

If the model assumptions are not satisfied, for example, if the distribution instead of a long tail has a shortened tail (from the right), then for one of the variances we obtain negative estimates. ∆

Note 2. Blunders in the observations (measurements with large gross errors) must be rejected before applying the PEROBVC method. For example, first find

s 2 = 1 n ∑ 1 n s i 2 and then s 0.9999 2 = 1 f χ f ; 0.9999 2 ⋅ s 2 .

Those s i 2 exceeding s 0.9999 2 are rejected, whereas this procedure is repeated once more with non-rejected results. ∆

The advantages of the PEROBVC method are:

1) Unbiased estimators for σ 1 2 and σ 2 2 , if B are close to B_{opt}, and

2) Minimal variances for σ 1 2 .

The disadvantages of the PEROBVC method are:

1) A sensitivity to the choice of the point B, (point B must be close to B_{opt}), which can result in negative estimates for either of the variances σ 1 2 or σ 2 2 , or for both, and

2) Sensitivity to the choice of the initial values for the variances, σ 1 2 and σ 2 2 , which, also, can result in negative estimates for one or both variances.

The stopping criterion for the iterative process: Let x k T = [ σ 1 , k 2 σ 2 , k 2 ] be the vector of parameter estimates in the k-th iteration and d the difference vector of these estimates from the (k + 1)-th and k-th iterations, d T = [ ( σ 1 , k + 1 2 − σ 1 , k 2 ) ( σ 2 , k + 1 2 − σ 2 , k 2 ) ] , then the iterations should be stopped if

‖ d ‖ < 10 − q ‖ x k ‖ , q ∈ { 5 , 6 , 7 , 8 , 9 } . (16)

In this case we shall use the method of Maximum Likelihood (ML).

The Robust Maximum Likelihood (ML) EstimationThe ML method is based on the assumption that in the domain X there exists only variances s ′ 2 ( s ′ 2 ≡ s 2 ; with E [ s 2 ] = σ 2 ) of the basic distribution, whereas in the Y interval in addition to basic variances exist the variances of contaminating distributions. Therefore the point B in

The ML solution is obtained from the probability maximum for the event

[ D ( 1 ) ∧ ⋯ ∧ D ( n X ) ] ∧ ( s 2 > B ) ,

where D ( i ) = s ( i ) 2 < s 2 < s ( i ) 2 + d s 2 , i = 1 , ⋯ , n X , one obtains the likelihood function (with proportionality constant k):

L = k ∏ i = 1 n X f ( s i 2 ) [ 1 − F ( B ) ] n Y .

Hence it follows

ln L ∝ ∑ X ln f ( s i 2 ) + n Y ln [ 1 − F ( B ) ] .

From the necessary condition ∂ ln L / ∂ σ 1 2 = 0 one obtains

− n X σ 2 + ∑ X s i 2 + 2 B n Y f ( y B ) 1 − F ( y B ) = 0 .

Their solution is given as simple iterations

σ k + 1 2 = 1 n X ∑ X s i 2 + 2 B n Y n X ⋅ f ( y B , k ) 1 − F ( y B , k ) , (17)

where

y B , k = f ⋅ B σ k 2 , F B , k = ∫ 0 y B , k f ( y ) d y , ( y = f s 2 σ 2 = χ f 2 , f ( y ) = k f − 1 y f / 2 − 1 e − y / 2 ) . (18)

The stopping criterion for the iterative process can be based on the parameter difference r σ = ( σ k + 1 2 − σ k 2 ) / σ k + 1 2 between two iterations:

| r σ | < 10 − c , c ∈ { 4 , 5 , 6 , 7 , 8 } , (19)

Example 1. By using the functions F-RAN1 and F-RAN2 [

Since the variances σ 1 2 and σ 2 2 are known then, according to (6), one can also find, B optimal, obtaining B_{opt} = 2.772589.

In

From

Example 2. In the 2D geodetic control network of object “TUZLA” [^{2}]), with f = 5 d.f., is obtained (

In

n_{Y} | % | B | ε [%] | ML: | |||||
---|---|---|---|---|---|---|---|---|---|

3990 | 40 | 1.3863 | 4002.870 | 1.0330 | 2.0326 | 5386.9 | 4613.1 | 46.1 | 0.8260 |

3000 | 30 | 1.7715 | 5557.883 | 0.9237 | 1.8834 | 4055.9 | 5944.1 | 59.4 | 0.8741 |

2500 | 25 | 1.9950 | 6494.014 | 1.0561 | 2.0624 | 5646.8 | 4353.2 | 43.5 | 0.9241 |

2000 | 20 | 2.2955 | 7562.874 | 1.0250 | 2.0138 | 5255.6 | 4744.4 | 47.4 | 0.9849 |

Mean: | 1.0095 | 1.9981 | 5086.3 | 4913.7 | 49.1 | 0.9023 |

s^{2} | Freq. | s^{2} | Freq. | s^{2} | Freq. | s^{2} | Freq. | s^{2} | Freq. |
---|---|---|---|---|---|---|---|---|---|

0.05 | 0 | 1.65 | 3 | 3.25 | 4 | 4.95 | 5 | 7.75 | 1 |

0.15 | 1 | 1.75 | 8 | 3.35 | 1 | 5.05 | 1 | 7.95 | 1 |

0.25 | 4 | 1.85 | 10 | 3.45 | 3 | 5.15 | 1 | 8.05 | 1 |

0.35 | 6 | 1.95 | 10 | 3.55 | 3 | 5.35 | 1 | 8.35 | 1 |

0.45 | 6 | 2.05 | 11 | 3.65 | 2 | 5.45 | 1 | 8.55 | 1 |

0.55 | 8 | 2.15 | 5 | 3.75 | 6 | 5.55 | 2 | 9.05 | 1 |

0.65 | 7 | 2.25 | 12 | 3.85 | 3 | 5.65 | 1 | 9.35 | 1 |

0.75 | 15 | 2.35 | 6 | 3.95 | 2 | 5.75 | 1 | 9.65 | 1 |

0.85 | 8 | 2.45 | 7 | 4.05 | 0 | 5.85 | 3 | 10.45 | 1 |

0.95 | 16 | 2.55 | 6 | 4.15 | 2 | 6.05 | 1 | 11.25 | 1 |

1.05 | 12 | 2.65 | 4 | 4.25 | 1 | 6.25 | 3 | 11.45 | 1 |

1.15 | 11 | 2.75 | 1 | 4.35 | 1 | 6.75 | 1 | 12.75 | 1 |

1.25 | 17 | 2.85 | 7 | 4.45 | 2 | 7.15 | 1 | 13.75 | 1 |

1.35 | 15 | 2.95 | 4 | 4.55 | 1 | 7.35 | 1 | 16.85 | 1 |

1.45 | 11 | 3.05 | 5 | 4.65 | 2 | 7.45 | 2 | S | n |

1.55 | 15 | 3.15 | 3 | 4.75 | 1 | 7.65 | 1 | 818.30 | 328 |

n_{Y} | % | B | ε [%] | ML: σ^{2} | |||||
---|---|---|---|---|---|---|---|---|---|

165 | 50.3 | 1.8 | 173.05 | 1.8788 | 6.8348 | 287.2 | 40.8 | 12 | 1.9915 |

134 | 41 | 2.1 | 233.60 | 1.9781 | 12.5460 | 312.0 | 16.0 | 5 | 1.4628 |

98 | 30 | 2.6 | 317.90 | 1.9157 | 3.1517 | 297.5 | 30.5 | 9 | 1.4930 |

82 | 25 | 3.0 | 363.00 | 1.7716 | 5.5333 | 264.9 | 63.1 | 19 | 1.5439 |

66 | 20 | 3.5 | 414.40 | 1.6918 | 5.0078 | 248.6 | 79.4 | 24 | 1.6216 |

50 | 15 | 4.0 | 474.30 | 1.7902 | 5.6048 | 267.4 | 60.6 | 18 | 1.7304 |

33 | 10 | 5.3 | 553.65 | 1.6501 | 4.8954 | 242.6 | 85.4 | 26 | 1.8842 |

Mean: | 1.8109 | 6.9391 | 274.3 | 53.7 | 16 | 1.6753 |

According to an a priori analysis for the horizontal-angle measurements in stable conditions, following Činklović [^{2}], resulting in a very good agreement with 1.8109 ["^{2}] obtained by using PEROBVC method based on the measurements.

Besides, on the basis of the results presented in

In Example 2 the average value from the ML estimates for

It should be noted that for the partition _{opt} = 8.18. ▲

Note 3. In Examples 1-2 the ML estimates for

On the basis of applications of the PEROBVC method in the treatment of real geodetic measurements, Example 2, the present author concludes that the variance estimator of the basic distribution

Generally, the PEROBVC method offers good estimates, especially for

On the basis of the obtained results in Examples 1-2 we can conclude the following:

1) On the basis of exact (expected) values from Example 1 the validity of the PEROBVC method in the variance estimation, as well as the estimates of numbers of observations, for both distributions in the PERG mixed distribution of empirical variances is confirmed. Here the variance estimates for both distributions, basic and contaminating ones, are correct; i.e. their values are exact.

2) On the basis of realistic measurements for the horizontal angles in the geodetic 2D control network from Examples 2 good (satisfactory) parameter estimates for both distributions are also confirmed.

3) In Example 2 the variance estimate for the basic distribution is confirmed through the result of the a priori analysis.

Dr ZORICA Cvetković from the Astronomical Observatory in Belgrade is the author of the FORTRAN programmes who performed the calculations in the examples. The OBJEKTIV GEO D.O.O. Company from Belgrade (Serbia) has financially supported the publishing of the article.

The author declares no conflicts of interest regarding the publication of this paper.

Perović, G. (2020) Robust Variance Components Estimation in the PERG Mixed Distributions of Empirical Variances—PEROBVC Method. Open Journal of Statistics, 10, 640-650. https://doi.org/10.4236/ojs.2020.104038