^{1}

^{*}

^{2}

^{2}

Approximation of finite population totals in the presence of auxiliary information is considered. A polynomial based on Lagrange polynomial is proposed. Like the local polynomial regression, Horvitz Thompson and ratio estimators, this approximation technique is based on annual population total in order to fit in the best approximating polynomial within a given period of time (years) in this study. This proposed technique has shown to be unbiased under a linear polynomial. The use of real data indicated that the polynomial is efficient and can approximate properly even when the data is unevenly spaced.

This study is using an approximation technique to approximate the finite population total called the Lagrange polynomial that doesn’t require any selection of bandwidth as in the case of local polynomial regression estimator. The Lagrange polynomials are used for polynomial interpolation and extrapolation. For each given set of distinct points x_{j} and y_{j}, the Lagrange polynomial of the lowest degree takes on each point x_{j} corresponding to y_{j} (i.e. the functions coincide at each point). Although named after Joseph Louis Lagrange, who published it in 1795, the method was first discovered in 1779 by Edward Waring. It is also an easy consequence of a formula published in 1783 by Leonhard Euler as will be seen later on how it works.

[^{th} unit. Of interest is the estimation of population total Y t = ∑ i = 1 N x i using the known population totals X t = ∑ i = 1 N x i at the estimation stage, if we let s 1 , s 2 , ⋯ , s n be the set of sampled units under a general sampling design p, and let π i = p ( i ∈ s ) be the first order inclusion probabilities. In 1940, Cochran made an important contribution to the modern sampling theory by suggesting methods of using the auxiliary information for the purpose of estimation in order to increase the precision of the estimates [

y ¯ r = y ¯ x ¯ X ¯ ; x ¯ ≠ 0

The aim of this method is to use the ratio of sample means of two characters which would be almost stable under sampling fluctuations and, thus, would provide a better estimate of the true value. It has been well-known fact that y ¯ r is most efficient than the sample mean estimator y ¯ , where no auxiliary information is used, if ρ_{yx}, the coefficient of correlation between y and x, is greater than half the ratio of coefficient of variation of x to that of y, that is, if

ρ y x > 1 2 ( C x C y ) (1.0)

Thus, if the information on an auxiliary variable is either already available or can be obtained at no extra cost and it has a high positive correlation with the main character, one would certainly prefer ratio estimator to develop more and more superior techniques to reduce bias and also to obtain unbiased estimators with greater precision by modifying either the sampling schemes or the estimation procedures or both. [

y ¯ q = y ¯ X ¯ x ¯ ; X ¯ ≠ 0 (1.1)

that was proposed by [

ρ y x ≤ − 1 2 ( C x C y ) (1.2)

The expressions for bias and mean square errors of y ¯ r and y ¯ q have been derived by [

[

y ¯ d = y ¯ + β ( X ¯ − x ¯ ) (1.3)

where β is a constant. The best choice of β which minimizes the variance of the estimator is seen to be

β = S y x S x 2 (1.4)

which is the population regression coefficient of y on x. Since, β is generally unknown in practice, it is estimated by sample regression coefficient

b = s y x s x 2 (1.5)

Using sample regression coefficient (i.e. b), Watson defined simple linear regression estimator as

y ¯ 1 r = y ¯ + b ( X ¯ + x ¯ ) (1.6)

This estimator is biased, the bias being negligible for large samples.

The most common way of defining a more efficient class of estimators than usual ratio (product) and sample mean estimator is to include one or more unknown parameters in the estimators whose optimum choice is made by minimizing the corresponding mean square error or variance. Sometimes, such modifications or generalizations are made by mixing two or more estimators with unknown weights whose optimum values are then determined which generally depend upon population parameters. In order to propose efficient classes of estimators, [

y ¯ f = y ¯ [ ( A + C ) X ¯ + f B x ¯ ( A + f B ) X ¯ + C x ¯ ] (1.7)

where A = ( d − 1 ) ( d − 2 ) , B = ( d − 1 ) ( d − 4 ) , C = ( d − 2 ) ( d − 3 ) ( d − 4 ) , d > 0 , f = n N . The literature on survey sampling describes a great variety of

techniques of using auxiliary information to obtained more efficient estimators. Keeping this fact in view, a large number of authors have paid their attention toward the formulation of modified ratio and product estimators using information on an auxiliary variate, for instance, see [

Suppose n is large and M S E ( R ^ ) = V a r ( R ^ ) . We assume that x ¯ and X ¯ are quite close such that

R ^ − R = y ¯ − R x ¯ x ¯ = y ¯ − R x ¯ X ¯

so that the bias of R ¯ becomes quite small.

The concept of nonparametric models within a model assisted framework was first introduced by [

Y ^ g e n = ∑ i ∈ s y i π i + ( ∑ j = 1 N μ ^ ( x j ) − ∑ i ∈ s μ ^ ( x i ) π i ) (1.8)

The first term in (1.8) is a design estimator which the second is model component. Therefore, when the sample comprises of the whole population, the model component reduces to zero since π_{i} = 1 and s = N. We therefore have the actual population total. [_{i}. They proposed model

calibration estimator for population total Y_{t} to be Y ˜ = ∑ i ∈ s y i π i

In local polynomial regression, a lower-order weighted least squares (WLS) regression is fit at each point of interest, x using data from some neighborhood around x. Following the notation from [_{i}, Y_{i}) be ordered pairs such that

Y i = m ( X i ) + σ ( X i ) ε i (1.9)

where ε ~ N ( 0 , 1 ) , σ 2 ( X i ) is the variance of Y_{i} at the point X_{i}, and X_{i} comes from some distribution, f. In some cases, homoscedastic variance is assumed, so we let σ 2 ( X ) = σ 2 . It is typically of interest to estimate m(x). Using Taylor’s expansion:

m ( x ) ≈ m ( x o ) + m ′ ( x o ) ( x − x o ) + ⋯ + m n ( x o ) n ! ( x − x o ) n (1.91)

We can estimate these terms using weighted least squares by solving the following for β:

∑ i = 1 n [ Y i − ∑ j = 0 q β j ( X i − x 0 ) j ] 2 K h ( X i − x 0 ) (1.92)

In (1.92), h controls the size of the neighborhood around x_{0}, and K_{h}(.) controls the weights, where K h ( . ) ≡ K ( ⋅ h ) h , and K is a kernel function. Denote the solution to (1.92) as β ^ ^{.} Then estimated m v ( x 0 ) = v ! β ^ V . [

The Horvitz-Thompson (HT) estimator, which is originally discussed by [_{i} but instead uses only the study variable y_{i} to obtain the population total.

Consider the population of size N with units y 1 , y 2 , y 3 , ⋯ , y N . Suppose we want to select sample s of size n_{s}.

Let π_{i} be the probability of including i^{th} unit of the population in sample s. This is called the inclusion probability or first order inclusion probability of i^{th} unit in the sample.

Let π_{ij} be the probability of including i^{th} and j^{th} units in the sample. This is called the joint inclusion probability or second order inclusion probability.

When the sample is obtained from a probability sampling design, an unbiased estimator for the Total Y = ∑ i = 1 N y i is given by

Y ^ H T = ∑ i = 1 N y i π i = ∑ i = 1 N y i π i − 1 (1.93)

Y ^ H T is unbiased under design based approach [

V ( Y ^ H T ) = ∑ i = 1 N ∑ j = 1 N ( π i j − π i π j ) y i y j π i π j

The variance of this estimator can be minimized when π_{i} ∝ y_{i}. That is, if the first order inclusion probability is proportional to y_{i}, the resulting HT estimator under this sampling design will have zero variance. However, in practice, we can’t construct such design because we don’t know the value of y_{i} in the design stage. If there is a good auxiliary variable x_{i} which is believed to be closely related with y_{i}, then a sampling design with π_{i} ∝ x_{i} can lead to very efficient sampling design This method of estimating the finite population totals doesn’t make use of the auxiliary information x_{i} but instead uses only the study variable y_{i} to obtain the population totals.

Research literature has revealed that the ratio estimator performs better than the local linear polynomial estimator when the population is linear no matter which variance is used. The local linear polynomial regression estimator becomes a better estimator when the population used is either quadratic or exponential especially with an increase in the sample size which increases the likelihood of outliers in the sample.

One of the most useful and well-known classes of functions mapping the set of real numbers into itself is algebraic polynomials, the set of functions of the form

P n ( x ) = a n x n + a n − 1 x n − 1 + ⋯ + a 1 x + a 0

where n is a non-negative integer and a 0 , ⋯ , a n are real constants. One reason for their importance is that they uniformly approximate continuous functions. By this we mean that given any function, defined and continuous on a closed and bounded interval, there exists a polynomial that is as “close” to the given function as desired [

In Section 2 we briefly introduced the Lagrange polynomial and in Section 2.1 we further defined the Lagrange polynomial. Section 2.2 talked about properties of polynomial approximations and proof of the Karl Weierstrass theorem. Section 3 talked about the main results with the use of real data from the Kenya National Bureau of Statistics on population census. While Section 3.2 showed how to calculate missing values via interpolation. Section 3.3 and 3.4 extrapolated the population totals in 2009 and 2019 respectively. Section 4 concluded by stating that, the best approximating polynomial for a quick convergence must be a linear one in order to give a better extrapolation.

In this section, we are basically introducing an approximator that is the Lagrange polynomial approximate of the finite population totals.

Consider a finite population U = { U 1 , U 2 , ⋯ , U N } of N units. Let (y, x) be the (total, year) variables taking non negative real values (y_{i}, x_{i}) respectively, on the unit U i ( i = 1 , 2 , ⋯ , N ) . From the population U, a simple random sample of size n is drawn without replacement. Then, the Lagrange interpolating polynomial is the polynomial p(x) of degree ≤ (n − 1) that passes through the n points ( x 1 , y 1 = f ( x 1 ) ) , ( x 2 , y 2 = f ( x 2 ) ) , ⋯ , ( x n , y n = f ( x n ) ) and is given by:

P ( x ) = ∑ j = 1 n P j ( x ) ,

where P j ( x ) = y j ∏ k = 1 n x − x k x j − x k written explicitly,

P ( x ) = ( x − x 2 ) ( x − x 3 ) ⋯ ( x − x n ) ( x 1 − x 2 ) ( x 1 − x 3 ) ⋯ ( x 1 − x n ) y 1 + ( x − x 1 ) ( x − x 3 ) ⋯ ( x − x n ) ( x 2 − x 1 ) ( x 2 − x 3 ) ⋯ ( x 2 − x n ) y 2 + ⋯ + ( x − x 1 ) ⋯ ( x − x n − 1 ) ( x n − x 1 ) ⋯ ( x n − x n − 1 ) y n

Polynomial Approximation of Functions:

Weierstrass Theorem:

f : [ a , b ] → R continuous

Then there exists a sequence of polynomials P_{n}(x) such that ‖ f − P n ‖ ∞ = max x ∈ [ a , b ] | f ( x ) − P n ( x ) | → 0 as n → ∞

Proof of Theorem:

f : [ a , b ] = [ 0 , 1 ] → R continuous.

P n ( x ) = B n ( f ) ( x ) = ∑ k = 0 n ( n ! k ! ( n − k ) ) f ( k n ) x k ( 1 − x ) n − k

(Bernstein Polynomial)

| | f − P n | | ∞ → 0 as n → ∞

We are going to consider three functions: f ( x ) = 1 , f ( x ) = x and f ( x ) = x 2 and show convergence.

f ( x ) = 1

B n ( f ) ( x ) = ∑ k = 0 n n ! k ! ( n − k ) ! x k ( 1 − k ) n − k = ( x + 1 − x ) n = 1 , n ≥ 0

Hence

‖ f − B n ( f ) ‖ ∞ = 0

Also,

f ( x ) = x

B n ( f ) ( x ) = ∑ k = 0 n n ! k ! ( n − k ) ! k n x k ( 1 − k ) n − k = ∑ k = 1 n ( n − 1 ) ! ( k − 1 ) ! ( n − k ) ! x k ( 1 − x ) n − k

Let L = k − 1

= x ∑ L = 0 n − 1 ( n − 1 ) ! L ! ( n − 1 − L ) ! x L ( 1 − x ) n − 1 − L

Hence

‖ B n ( f ) − f ‖ ∞ = 0 , n ≥ 1

f ( x ) = x 2

= ∑ k = 1 n ( n − 1 ) ! ( k − 1 ) ! ( n − k ) ! k − 1 + 1 n x k ( 1 − x ) n − k = ∑ k = 2 n ( n − 1 ) ! ( k − 2 ) ! ( n − k ) ! 1 n x k ( 1 − x ) n − k + 1 n ∑ k = 1 n ( n − 1 ) ! ( k − 1 ) ! ( n − k ) ! x k ( 1 − x ) n − k

B n ( f ) ( x ) = ( n − 1 ) n x 2 ∑ k = 2 n ( n − 2 ) ! ( k − 2 ) ! ( n − k ) ! x k − 2 ( 1 − x ) n − k + 1 n x

‖ f − B n ( f ) ‖ ∞ = 1 4 n → 0 as n → ∞

In order to obtain a best approximating polynomial that has less error, one needs to choose a linear interpolating points that is closest to the target point

The plot showed an upward growth in the population of Kenya. This could be attributed to good health services causing a reduction in the maternal death, deaths as a result of disease outbreak, a boost in socio-economic growth and political stability (

However, we aimed at selecting a sample size of two from 1969 to 2009 population census using a technique of simple random sampling without replacement making a sample total of ten. A pair of linear samples selected were plotted on the same charts to approximate the function f(x) in green colour as shown below for each.

The chart in (

The linear polynomials in (

Similarly, the approximating linear polynomials in red and green in (

The approximating linear polynomials shown below in (

Finally, the approximating linear polynomials in (

approximate on its entire interval which is [1999, 2009] as the place for the Best Approximating Polynomial (BAP) to approximate the function f(x) uniformly to any degree of accuracy.

x [ 1 ] = [ 1999 ] and y [ 1 ] = [ 28 , 686 , 607 ] ; x [ 11 ] = [ 2009 ] and y [ 11 ] = [ 38 , 610 , 097 ]

Columns 1 through 8

28,686,607 29,678,956 30,671,305 31,663,654 32,656,003 33,648,352 34,640,701 35,633,050

Columns 9 through 11

36,625,399 37,617,748 38,610,097

y [ i ] = y [ i − 1 ] + ( y [ 11 ] − y [ i − 1 ] ) / h

where i ≥ 2 and h = annual step size

x [ 11 ] = [ 2009 ] and y [ 11 ] = [ 38 , 610 , 097 ] given

x [ 10 ] = [ 2008 ] and y [ 10 ] = [ 37 , 617 , 748 ] approximated

x [ 9 ] = [ 2007 ] and y [ 9 ] = [ 36 , 625 , 399 ] approximated

L 9 = ( x [ 11 ] − x [ 10 ] ) / ( x [ 9 ] − x [ 10 ] ) ∗ y [ 9 ]

L 10 = ( x [ 11 ] − x [ 10 ] ) / ( x [ 10 ] − x [ 9 ] ) ∗ y [ 10 ]

Approximated value = L9 + L10

Approximated population total = 38,610,097

Error = 0

x [ 11 ] = [ 2009 ] and y [ 11 ] = [ 38 , 610 , 097 ]

x [ 10 ] = [ 2008 ] and y [ 10 ] = [ 37 , 617 , 748 ]

L 19 = ( 2019 − x [ 11 ] ) / ( x [ 10 ] − x [ 11 ] ) ∗ y [ 10 ]

L 20 = ( 2019 − x [ 10 ] ) / ( x [ 11 ] − x [ 10 ] ) ∗ y [ 11 ]

Approximated value = L19 + L20

Approximated population total = 48,533,587

In this work, the Lagrange polynomial has proven to be a good technique in approximating the population total from data obtained from the Kenya National Bureau of Statistics (KNBS). The research revealed that, subsequent population totals can better be approximated using a sample closest to the target population being approximated. Therefore, the best approximating polynomial must be a linear form in order to obtain convergence with a diminishing variation in a given interval. The precision of this technique can better be measured with the outcome obtained in the interpolation of missing values shown in the results above to extrapolate the population total in 2009 which was equal to the exact population obtained in that census. We therefore conclude that, the population of Kenya for the 2019 census will be forty-eight million five hundred and thirty-three thousand five hundred and eighty-seven.

We are grateful to the authors for their numerous and valuable contributions to this work, most especially the first author.

The author(s) declare(s) that there is no conflict of interest regarding the publication of this paper.

Kabareh, L., Mageto, T. and Muema, B. (2017) Approximation of Finite Population Totals Using Lagrange Polynomial. Open Journal of Statistics, 7, 689- 701. https://doi.org/10.4236/ojs.2017.74048