_{1}

^{*}

In two-phase sampling, or double sampling, from a population with size N we take one, relatively large, sample size n. From this relatively large sample we take a small sub-sample size m, which usually costs more per sample unit than the first one. In double sampling with regression estimators, the sample of the first phase n is used for the estimation of the average of an auxiliary variable X, which should be strongly related to the main variable Y (which is estimated from the sub-sample m). Sampling optimization can be achieved by minimizing cost C with fixed var Y, or by finding a minimum var Y for fixed C. In this paper we optimize sampling with use of Lagrange multipliers, either by minimizing variance of Y and having predetermined cost, or by minimizing cost and having predetermined variance of Y.

All decision-making requires information. In forestry, this information is acquired by means of forest inventories, systems for measuring the extent, quantity and condition of forests [

The main method used in forest inventories in the 19^{th} century was complete enumeration, but it was soon noted that there was a possibility to reduce costs by using representative samples [

Lagrange multipliers is a method of evaluating maxima or minima of a function of possibly several variables, subject to one or more constraints [_{1}, x_{2}, , x_{n} are subject to m (<n) equality constraints of the form

where g_{1}, g_{2}, , g_{n} are differentiable functions.

This determination of the stationary points in this constrained optimization problem is done by first considering the function

where and λ_{1}, λ_{2}, , λ_{m} are scalars called Lagrange multipliers. By differentiating (2) with respect to x_{1}, x_{2}, , x_{n} and equating the partial derivatives to zero we obtain

Equations (1) and (3) consist of m + n unknowns, namely, x_{1}, x_{2}, , x_{n}; λ_{1}, λ_{2}, , λ_{m}. The solutions for x_{1}, x_{2}, , x_{n} determine the locations of the stationary points. The following argument explains why this is the case.

Suppose that in Equation (1) we can solve for m x_{i}’s, for example, x_{1}, x_{2}, , x_{n}, in terms of the remaining n – m variables. By Implicit Function Theorem (see Appendix 1), this is possible whenever

In this case, we can write

Thus f(x) is a function of only n – m variables, namely, x_{m}_{+1}, x_{m}_{+2}, , x_{n}. If the partial derivatives of f with respect to these variables exist and if f has a local optimum, then these partial derivatives must necessarily vanish, that is,

Now, if Equations (5) are used to substitute h_{1}, h_{2}, , h_{m} for x_{1}, x_{2}, , x_{n}, respectively, in Equation (1), then we obtain the identities

.

By differentiating these identities with respect to x_{m}_{+1}, x_{m}_{+2}, , x_{n} we obtain

Let us now define the vectors

.

Equations (6) and (7) can then be written as

where, which is a nonsingular m × m matrix if condition (4) is valid.

From Equation (8) we have

By making the proper substitution in Equation (9) we obtain

where

Equations (10) can then be expressed as

From Equation (11) we also have

Equations (12) and (13) can now be combined into a single vector equation of the form

which is the same as Equation (3). We conclude that at a stationary point of f, the values of x_{1}, x_{2}, , x_{n} and the corresponding values of λ_{1}, λ_{2}, , λ_{m} must satisfy Equations (1) and (3).

In two-phase sampling, or double sampling, from a population with size N we take one, relatively large, sample size n. From this relatively large sample we take a small sub-sample size m, which usually costs more per sample unit than the first one. In double sampling with regression estimators, the sample of the first phase n is used for the estimation of the average of an auxiliary variable X, , which should be strongly related to the main variable Y.

In the sub-sample m both auxiliary X and main Y variables are measured, in order to estimate their means

and, respectively. The regression estimator and its estimated variance are ([7,11-13]):

where is the variance of Y in the sub-sample m, and r is the estimated correlation coefficient between X and Y.

An approximate cost function could be

where C: total sampling cost;

C_{1}: sampling cost of the first phase;

C_{2}: sampling cost of the second phase.

Sampling optimization can be achieved by minimizing cost C with fixed, applying the following procedure:

We assume an approximately normal distribution of, so that the 95% confidence interval for would be:

where.

Now we must choose n and m in such a way, that half a confidence interval does not exceed a value D, fixed a priori, where D also may be expressed as a fraction (E)

of:

To this end we construct the Lagrange function:

Remind that

so

Solving the system of Equations (14), (15) and (16) we find n, m and λ. The reverse problem, viz. finding for fixed C, is solved in a similar way.

In order to explain how Lagrange multipliers work, we describe the following example: Assume that the total cost C of an industry producing two products x and y, is given by the equation. The production is limited with a limitation of 20 units, that is. Thus, we have:

By solving the equations’ system (a), (b) and (c), we find that x = 13, y = 7 and λ = –71. Consequently,.

The economic meaning of λ is this: λ is the reduction of the total cost to the limit, if production was 19 instead of 20 units. In other words, if we required 19 total production units, the total cost would be reduced by 71 monetary units (710 – 71 = 639). Generally, λ represents the marginal effect on the cost function, when production limitation is increased by one unit.

If there are not enough sample plots to give sufficiently good inventory results using only forest measurements, we may try to make use of auxiliary variables correlated with forest variables. The most obvious way is to use ratio or regression estimators (see Appendix 2). The calibration estimator of Deville and Särndal [

The basic features of the calibration estimator of Deville and Särndal [

Assume that a probability sample S is drawn, and y_{j} and x_{j} are observed for each j in S, the objective being to estimate the mean of y,. Let π_{j} be the inclusion probability and d_{j} the basic sampling design weight, which can be used to compute the unbiased Horvitz-Thompson estimator

.

A calibration estimator

is obtained by minimizing the sum of distances,

, between the prior weights d_{j} and posterior weights w_{j} for a positive distance function G, taking account of the calibration equation

If the distance between d_{j} and w_{j} is defined as

the calibration estimator will be the same as the regression estimator

where and (a weighted regression coefficient vector) are

and

If the model contains an intercept, the corresponding variable x will be one for all observations, and the calibration Equation (18) will then guarantee that the weights w_{j} add up to one. This means that when estimating totals, the weights Nw_{j} will add up to the known total number of pixels in the population. Thus Nw_{j} can be interpreted as the total area, in pixel units, for plots of forest similar to plot j. The standard least squares theory implies that the regression estimator (19) can be expressed in the form

It is assumed that the intercept is always among the parameters. Estimator (21) is defined if the moment matrix is non-singular.

Some of the weights w_{j} in (17) implied by Equations (20)-(22) may be negative. Nonnegative weights are guaranteed if the distance function is infinite for negative w_{j}. Deville and Särndal [

Minimization of the sum, so that (18) is satisfied is a non-linear constrained minimization problem. Using Lagrange multipliers, the problem can be reformulated as a non-linear system of equations which can be solved iteratively using Newton’s method [_{j}’s of the regression estimator (19).

Since the calibration estimator is asymptotically equivalent to the regression estimator, Deville and Särndal [

The emphasis on area interpretation for the weights has the same argument behind it as was used by Moeur and Stage [

Lappi [

Let, where D is an open subset of R^{m}^{ +}^{ n}, and g has continuous first-order partial derivatives in D. If there is a point, where, with, such that, and if at z_{0},

where g_{i} is the ith element of, then there is a neighborhood of y_{0} in which the equation can be solved uniquely for x as a continuously differentiable function of y.

In a stratified inventory information on some auxiliary variables is used both to plan the sampling design (e.g. allocation) and for estimation, or only for estimation (post-stratification). Stratification is not the only way to use auxiliary information, however, as it can be used at the design stage, e.g. in sampling proportional to size (see Appendix 3). It can also be used at the estimation stage in ratio or regression estimators, so that the standard error of the estimators can be reduced using information on a variable x which is known for each sampling unit in the population. The estimation is based on the relationship between the variables x and y. In ratio estimation, a model that goes through the origin is applied. If this model does not apply, regression estimator is more suitable. The ratio estimator for the mean is

where is the mean of a variable x in the population and in the sample. Ratio estimators are usually biased, and thus the root mean square error (RMSE) should be used instead of the standard error. The relative bias nevertheless decreases as a function of sample size, so that in large samples (at least more than 30 units) the accuracy of the mean estimator can be approximated as [

.

The ratio estimator is more efficient the larger the correlation between x and y relative to the ratio of the coefficients of variation CV. It is worthwhile using the ratio estimator if

.

The (simple linear) regression estimator for the mean value is

where is the OLS (Ordinary Least Squares) coefficient of x for the model, which predicts the population mean of y based on the sample means. In a sampling context, the constant of the model is not usually presented, but the formula for the constant, , is embedded in the equation. The model is more efficient the larger the correlation between x and y. The variance of the regression estimator can be estimated as

.

Appendix 3. Sampling with Probability Proportional to SizeThe basic properties of sampling with arbitrary probabilities can also be utilized in sampling with probability proportional to size (PPS), such as sampling with a relascope. It is then assumed that unit i is selected with the probability kx_{i}, where k is a constant and x is a covariate (diameter of a tree in relascope sampling). PPS sampling is more efficient the larger the correlation between x and y. For perfect correlation the variance in the estimator would be zero [

In practice, PPS sampling can be performed by ordering the units, calculating the sum of their sizes (say), and calculating. The probability of a unit i being selected is then and a cumulative probability can be calculated for the ordered units. A random number r is then picked and each unit with a cumulative probability equal to (or just above) r, r + 1, r + 2, , r + n – 1 is selected for the sample. Every unit of size greater than is then selected with certainty.