^{1}

^{1}

This paper mainly addresses maximum likelihood estimation for a response-selective stratified sampling scheme, the basic stratified sampling (BSS), in which the maximum subsample size in each stratum is fixed. We derived the complete-data likelihood for BSS, and extended it as a full-data likelihood by incorporating incomplete data. We also similarly extended the empirical proportion likelihood approach for consistent and efficient estimation. We conducted a simulation study to compare these two new approaches with the existing estimation methods in BSS. Our result indicates that they perform as well as the standard full information likelihood approach. Methods were illustrated using a growth model for fish size at age, including between-individual variability. One of our major conclusions is that the fully observed BSS data, the partially observed data used for stratification, and the sampling strategy are all important in constructing a consistent and efficient estimator.

In stratified random sampling (SRS), the population or a random sample of the population is partitioned into relatively homogeneous subgroups, or strata, and then random samples are taken independently in each stratum for full observation. Such sampling design may also be regarded as a kind of two-phase sampling, with the population or the large sample before partitioning being the first phase sample, and the smaller and more extensive subsamples after partitioning being the second phase samples.

Practical implementations of SRS frequently fall into two categories as classified by [

Assume that there are a total of N sampling units on which the stratified sampling is conducted. Let y i and x i , i = 1 , ⋯ , N , denote respectively the vectors of responses and covariates of the ith individual generated from the joint distribution f ( y , x | θ ) = g ( y | x ; θ ) h ( x ) , with θ being a vector of all the parameters describing the conditional distribution of y given x . In SRS ( y , x ) are fully observed only for a subset of size n of the N units, which are called complete data in this paper, and only a subset z of ( y , x ) is observed for the other N − n units, which are called incomplete data.

In SRS the unobserved elements of y and/or x are missing data, and missingness can be fully accounted for by variable z which is observed for all the N units; that is, the unsampled variables are missing at random (MAR) in the terminology of [

L F ( θ ) = [ ∏ i = 1 n f ( y i , x i | θ ) ] [ ∏ i = n + 1 N u ( z i | θ ) ] , (1)

where u ( z | θ ) is the density function of z , i = 1 , ⋯ , n enumerates the second phase complete data, and i = n + 1 , ⋯ , N enumerates the first phase incomplete data.

If the response y is not involved in the stratification, namely, vector z contains no elements of vector y , u ( z | θ ) = u ( z ) is independent of parameters θ , and the full likelihood L F ( θ ) reduces to ∏ i = 1 n g ( y i | x i ; θ ) , which is trivial since neither the sampling scheme nor the covariate distribution h ( x ) needs to be taken into account. In this paper we consider only the SRS where the response y is involved in stratification, which is often referred to as response-selective stratified sampling (RSSS).

In fisheries surveys, length-stratified age sampling (LSAS) is one of the most popular strategies for sampling the age distribution of a fish population. In the first phase of LSAS a large amount of caught fish of a certain species is measured for length, and classified into length strata (e.g. two centimeters, five centimeters). In the second phase a pre-specified small number of fish are randomly selected from each stratum for age measurement. LSAS is BSS, and since growth models generally describe how length increases as a function of age (i.e. length is the response and age is the covariate), it is response-selective. LSAS has been conducted world-wide for several decades. For example, the Canadian Department of Fisheries Oceans (DFO) conducts annual surveys since the 1970’s and uses LSAS for age sampling for many species such as cod and American plaice. Millions of length-at-age data have been accumulated for each species, which are invaluable for fisheries stock assessment and ocean ecosystem studies. In this paper we focus on BSS, with some of the methods and conclusions also applicable to VPS.

[

This is the motivation of this paper. [

The outline of this paper is as follows. In Section 2 we define notations and review the likelihood and pseudolikelihood approaches relevant to this study. In Section 3 we derive the complete-data density function, complete-data likelihood and full-data likelihood for BSS. Application of an empirical proportion approach, which is an improved version of the pseudoconditional likelihood approach, to BSS is explored in Section 4. Results from simulation studies based on a linear model with between-individual (BI) variation and a Von Bertalanffy growth model with BI variation are presented in Section 5 to compare the performance of all these new and existing estimators discussed in this paper. The most promising estimators are then further illustrated in Section 6 by fitting the VonB model with BI variation to growth data for American plaice (Hippoglossoides platessoides) collected by DFO. Some further discussions are provided in Section 7.

Suppose that N units ( y i , x i ) , i = 1,2, ⋯ , N , are generated from the joint distribution f ( y , x | θ ) . As mentioned previously we always assume that an appropriate parametric covariate distribution is available, then θ here includes not only the parameters describing conditional distribution of response y given covariate x , but also the parameters defining the covariate distribution. The range of ( y , x ) is divided into H exhaustive and mutually exclusive strata S 1 , S 2 , ⋯ , S H . Denote the probability for ( y , x ) to fall into the hth stratum as Q h ( θ ) , namely,

Q h ( θ ) = Pr { ( y , x ) ∈ S h } . (2)

Define the indicator variable

R i = ( 1, if ( y i , x i ) is fully observed , 0, if some information on ( y i , x i ) is missing . (3)

Because BSS2 can be classified as VPS [

n h = ( N h , if N h < m h , m h , if N h ≥ m h . (4)

Although the likelihood for BSS (4) is given by (1), several published studies use other likelihoods, and some of these are described as follows.

[

f c ( y i , x i | R i = 1 ; θ ) ∝ f ( y i , x i | θ ) Q h i ( θ ) if ( y i , x i ) ∈ S h i , (5)

and the constant of proportionality does not depend on θ . The conditional likelihood then becomes

L c ( θ ) = ∏ i : R i = 1 f ( y i , x i | θ ) Q h i ( θ ) , (6)

which is adopted in [

Weighted pseudo-likelihood estimators have been studied extensively since the 1980’s for problems involving response-selective sampling. For this topic we refer to [

l w ( θ ) = ∑ h = 1 H N h n h ∑ i = 1 n h ln f ( y i , x i | θ ) . (7)

Although this weighted log-pseudo-likelihood (7) may provide an unbiased parameter estimating equation, the HT approach is known to be inefficient, and can be seriously so in some situations such as when the sample unit values are not approximately proportional to the inclusion probabilities ([

An approach for addressing this inefficiency issue is to adjust the standard HT weights by using the whole set of incomplete data, namely, those with only a subset of ( y , x ) measured but available for all the N sample units (see e.g. [

∑ i = 1 N R i w i y i = ∑ i = 1 N y i , (8)

where w i are the modified weights. Similarly one can also calibrate up to higher order moments or calibrate the empirical distributions by imposing the constraints

∑ j = 1 N R j w j 1 y j ≤ y i = ∑ j = 1 N 1 y j ≤ y i , (9)

where i enumerates all the subjects selected for full observation, and 1 y j ≤ y i = 1 if y j ≤ y i and 0 otherwise. Nevertheless, these calibration strategies may not produce better estimates than (8) does, according to our simulation studies. Hence, in this paper we only report results with constraint (8). The calibrated weighted likelihood approach under all these constraints can be conveniently implemented with Equation (9) in [

In some applications (e.g. [

f ( y , x | BSS ; θ ) ≈ f ( y , x | VPS ; θ ) = n h N h f ( y , x | θ ) ∑ h ′ = 1 H n h ′ N h ′ Q h ′ . (10)

Parameters are estimated based on the likelihood function defined from (10). Note that with the availability of a valid covariate distribution, a density function similar to (10) can also be constructed for the N − n incomplete observations (i.e. those partially observed units). In LSAS there are always some empty strata with N h = 0 but non-negligible occupation probability Q h , which are missing in the denominator of (10). We will address these issues in Section 4 and call the improved likelihood the “empirical proportion (EP) likelihood”.

As mentioned previously, the methods for VPS are applicable to BSS2, and the complete-data likelihood for VPS is given in [

We denote dbin ( x , N , p ) and pbin ( x , N , p ) respectively as the binomial probability mass function and cumulative probability, with number of successes x, total number of events N and success probability p. The density function for a unit selected for full observation in BSS is denoted as f B C ( ⋅ ) with “BC” indicating “BSS complete data”.

Theorem 1. In BSS the density function of a unit ( y , x ) selected for full observation is given by

f B C ( y , x | R = 1 ; θ ) = f ( y , x | θ ) Q h × [ ∑ N h = 1 m h − 1 N h dbin ( N h , N , Q h ) ] + m h [ 1 − pbin ( m h − 1, N , Q h ) ] ∑ h ′ = 1 H { [ ∑ N h ′ = 1 m h ′ − 1 N h ′ dbin ( N h ′ , N , Q h ′ ) ] + m h ′ [ 1 − pbin ( m h ′ − 1, N , Q h ′ ) ] } (11)

if ( y , x ) ∈ S h .

The proof of Theorem 1 is given in the Appendix. As suggested in [

L B C = ∏ i : R i = 1 f B C ( y i , x i | R i = 1 ; θ ) . (12)

With the same arguments for deriving (11), the density function for the partially observed units is

f B I ( z | R = 0 ; θ ) = f ( z | θ ) Q h × ∑ N h = m h + 1 N ( N h − m h ) dbin ( N h , N , Q h ) ∑ h ′ = 1 H { ∑ N h ′ = m h ′ + 1 N ( N h ′ − m h ′ ) dbin ( N h ′ , N , Q h ′ ) } , (13)

where the subscript “BI” denotes “BSS incomplete data”. The summations in (13) may be calculated more efficiently using

∑ N h = m h + 1 N ( N h − m h ) dbin ( N h , N , Q h ) = N Q h − ∑ N h = 1 m h N h dbin ( N h , N , Q h ) − m h [ 1 − pbin ( m h , N , Q h ) ] . (14)

Densities (11) and (13) incorporate respectively the information of complete data and incomplete data. We anticipate that they together can lead to better inference than using only complete data. The BSS full-data (BF) likelihood is

L B F ( θ ) = ∏ h = 1 H [ ∏ i = 1 n h f B C ( y i , x i | R i = 1 ; θ ) ] [ ∏ i = n h + 1 N h f B I ( z i | R i = 0 ; θ ) ] . (15)

Here and in the remainder of this paper, we enumerate the fully observed units in the hth stratum as 1, ⋯ , n h , and the partially observed units in the same stratum as n h + 1, ⋯ , N h .

In some cases only the number of incomplete measurements, ( N h − n h ) , in each stratum are known, instead of the measured values of all z i ’s. In this situation we need to integrate out z in (13) and rewrite the likelihood function (15) as

L B F ( θ ) = ∏ h = 1 H [ ∏ i = 1 n h f B C ( y i , x i | R i = 1 ; θ ) ] × [ ∑ N h = m h + 1 N ( N h − m h ) dbin ( N h , N , Q h ) ∑ h ′ = 1 H { ∑ N h ′ = m h ′ + 1 N ( N h ′ − m h ′ ) dbin ( N h ′ , N , Q h ′ ) } ] N h − n h . (16)

In real data analysis it is important to examine residuals for the fitted model to assess the validity of assumptions. Equation (11) gives the density function for BSS complete data, and can be used to calculate residuals. For simplicity we assume response y to be univariate y. Define the density function of x conditional on R = 1 as

h B C ( x | R = 1 ; θ ) = ∫ f B C ( y , x | R = 1 ; θ ) d y .

E ( y | x , R = 1 ) = ∫ y f B C ( y , x | R = 1 ; θ ) d y h B C ( x | R = 1 ; θ ) ,

E ( y 2 | x , R = 1 ) = ∫ y 2 f B C ( y , x | R = 1 ; θ ) d y h B C ( x | R = 1 ; θ ) ,

Var ( y | x , R = 1 ) = E ( y 2 | x , R = 1 ) − [ E ( y | x , R = 1 ) ] 2 .

The standardized residual for the ith observation ( y i , x i ) is

y i − E ( y | x i , R = 1 ) Var ( y | x i , R = 1 ) . (17)

The measured data such as length and age are usually discrete, and the above integrations become summations, which are easier to evaluate.

In this section we expand density (10) for application in BSS and especially in LSAS.

Empty strata ( N h = 0 ) always happen with LSAS. For the empty strata in (10), the empirical selection proportions n h / N h ( = 0 / 0 ) are not defined. We need to assign selection probabilities for full and incomplete observations to those unobserved strata. In VPS these selection probabilities may be determined by the maximum likelihood method [

f EP ( y , x | R = 1 ; θ ) = n h N h f ( y , x | θ ) ∑ h ′ = 1 H o b s n h ′ N h ′ Q h ′ + ∑ h ′ = H o b s + 1 H t o t a l Q h ′ . (18)

Here h = 1, ⋯ , H o b s enumerate the strata with data observed, and h = H o b s + 1, ⋯ , H t o t a l enumerate the strata without data. H t o t a l is the total number of strata with nonnegligible occupation probabilities Q h (see Equation (2)).

Similarly, we can include information from the incomplete observations using their EP density,

f EP ( z | R = 0 ; θ ) = N h − n h N h f ( z | θ ) ∑ h ′ = 1 H o b s N h ′ − n h ′ N h ′ Q h ′ . (19)

Here, without loss of generality, we assume that z falls in the hth stratum. For an unobserved stratum h, since we have defined its proportion for full observation n h / N h = 1 , its proportion for partial observation ( N h − n h ) / N h = 0 . The EP likelihood function then has the form

L EP ( θ ) = ∏ h = 1 H [ ∏ i = 1 n h f EP ( y i , x i | R i = 1 ; θ ) ] [ ∏ i = n h + 1 N h f EP ( z i | R i = 0 ; θ ) ] . (20)

If only the number of incomplete observations in each stratum is reported without knowing the z values, z in (19) needs to be integrated out and the likelihood (20) becomes

L EP ( θ ) = ∏ h = 1 H [ ∏ i = 1 n h f EP ( y i , x i | R i = 1 ; θ ) ] [ Q h ∑ h ′ = 1 H o b s N h ′ − n h ′ N h ′ Q h ′ ] N h − n h . (21)

In this section we examine the performance of the inference approaches for BSS described in the previous sections. We use two simple examples: a linear model with between individual (BI) variation, and a nonlinear Von Bertalanffy (VonB) growth model with BI variation. The simulation setup is as follows.

The linear model with BI variation is

Y = a + B X + ε , (22)

where B ∼ N ( μ b , σ b 2 ) , X ∼ N ( μ x , σ x 2 ) and ε ∼ N ( 0, σ ε 2 ) . Capital letter B denotes the random effect of BI variation. We randomly generated N = 5000 ( x i , y i ) pairs, i = 1 , ⋯ , N , from model (22). The parameters of the model were chosen as a = 0.5 , μ b = 0.2 , 0.5 and 1.0, σ b = 1.0 , μ x = 1.0 , σ x = 5.0 , and σ ε = 0.7 . Here we selected a small intercept a so that the issues with the relative performance in its estimation as defined by (25) can be clearly seen. Slope is an important parameter in linear model. Hence we selected small, moderate and large values for its mean μ b and a relatively large standard deviation (SD) σ b to test different approaches in identifying the slope under various situations. The mean μ x and SD σ x for covariate X are chosen so that the spread of the covariate allows reasonable estimates of the model parameters. We adopted a moderate error SD ( σ ε ) relative to the other parameters. We stratified the data by length (Y) bins of size 2 and randomly selected a maximum of 15 units per length stratum to keep their X values, and dropped the X values of the other units not selected. This sampling design is close to the LSAS of fishery surveys that we would like to address in this study.

The VonB model is a commonly used growth model in fisheries science (e.g. [

y ( a ) = l ∞ ( 1 − e − k ( a − a 0 ) ) , (23)

where y ( a ) denotes length at age a , l ∞ is the maximum possible size (as a → ∞ ), k is the growth rate parameter, and a 0 ( < 0 ) is the theoretical age at which the fish would have had zero length. Variation in growth is also important for population and community dynamics (e.g. [

Y = μ ( A ) + ε , (24)

where Y is the measured length, μ ( A ) = l ∞ ( 1 − e − k ( A − a 0 ) ) , A ∼ G a m m a ( α , β ) and ε ∼ N ( 0, [ CV × μ ( A ) ] 2 ) . The error ε here in fact includes both BI variation and Y observation error.

We randomly generated N = 5000 ages from a gamma distribution with Case 1: ( α , β ) = ( 3.643 , 1.225 ) , and Case 2: ( α , β ) = ( 11.227 , 0.641 ) . α and β are determined by matching the mean = α β and variance = α β 2 with those of the age data for American plaice that we have been investigating. Case 1 represents a younger population with mean age = 4.46 and variance = 5.47, while case 2 represents an older population with mean = 7.20 and variance = 4.61. Case 1 has a broad age distribution close to the origin, and case 2 has a narrower distribution of ages. Lengths were then generated from model (24) with l ∞ = 70 , k = 0.2 , a 0 = − 0.07 and CV = 0.2 . We stratified the data by length classes of size 2 and randomly sampled a maximum of 15 units per length stratum to keep their ages and dropped all the other ages not selected.

Relative biases (RBias), relative standard errors (RSE), and relative square root mean squared errors (RRMSE) are defined as

RBias = 100 × Estimate − Truevalue | Truevalue | , RSE = 100 × Standarderror | Truevalue | , and RRMSE = 100 × MSE | Truevalue | . (25)

We derived these values using 500 simulations for the full information likelihood (1), conditional likelihood (6), weighted likelihood (7), calibrated weighted likelihood, complete-data likelihood (12), full-data likelihood (15), and EP likelihood (20) (see Tables 1-4). We also include the “random approach” based on maximizing the likelihood

Method | Value | a | μ x | σ x | μ b | σ b | σ ε |
---|---|---|---|---|---|---|---|

True value | 0.5 | 1.0 | 5.0 | 0.2 | 1.0 | 0.7 | |

Random | RBias | −21.09 | 151.91 | 59.57 | 166.77 | 62.92 | 1.12 |

RSE | 57.27 | 46.18 | 4.36 | 114.88 | 12.22 | 22.28 | |

RRMSE | 60.98 | 158.76 | 59.73 | 202.44 | 64.09 | 22.29 | |

Weighted | RBias | 2.62 | −0.37 | −0.92 | 1.05 | 1.28 | 1.70 |

likelihood | RSE | 35.26 | 40.68 | 4.70 | 42.28 | 5.57 | 34.06 |

RRMSE | 35.33 | 40.64 | 4.79 | 42.25 | 5.71 | 34.06 | |

Calibrated | RBias | 2.64 | −0.40 | −0.94 | 1.37 | 1.12 | 2.10 |

weighted | RSE | 34.69 | 40.76 | 4.71 | 40.99 | 5.40 | 34.09 |

likelihood | RRMSE | 34.75 | 40.72 | 4.80 | 40.97 | 5.51 | 34.12 |

Complete | RBias | −0.63 | 0.81 | 0.21 | 1.86 | 0.97 | 8.15 |

data | RSE | 42.85 | 19.48 | 2.43 | 25.45 | 5.11 | 36.34 |

likelihood | RRMSE | 42.81 | 19.48 | 2.44 | 25.49 | 5.20 | 37.20 |

Conditional | RBias | −4.23 | 11.83 | 4.83 | 15.94 | 7.46 | 1.77 |

likelihood | RSE | 50.38 | 32.66 | 10.30 | 56.97 | 16.11 | 35.53 |

RRMSE | 50.51 | 34.70 | 11.37 | 59.10 | 17.74 | 35.54 |

Full | RBias | 0.39 | −0.59 | −0.28 | 2.01 | 0.48 | 5.38 |
---|---|---|---|---|---|---|---|

information | RSE | 8.19 | 17.51 | 2.18 | 18.55 | 2.42 | 15.17 |

likelihood | RRMSE | 8.19 | 17.51 | 2.20 | 18.64 | 2.47 | 16.09 |

Full | RBias | 0.45 | −0.80 | −0.31 | 0.53 | 0.38 | 6.95 |

data | RSE | 8.18 | 18.46 | 2.23 | 17.85 | 2.38 | 12.39 |

likelihood | RRMSE | 8.18 | 18.46 | 2.25 | 17.84 | 2.41 | 14.20 |

Empirical | RBias | 0.40 | −0.17 | −0.07 | 1.49 | 0.63 | 5.89 |

proportion | RSE | 8.16 | 17.60 | 2.14 | 17.56 | 2.36 | 13.54 |

likelihood | RRMSE | 8.16 | 17.58 | 2.14 | 17.61 | 2.44 | 14.75 |

Method | Value | a | μ x | σ x | μ b | σ b | σ ε |
---|---|---|---|---|---|---|---|

True value | 0.5 | 1.0 | 5.0 | 0.5 | 1.0 | 0.7 | |

Random | RBias | −41.24 | 107.64 | 60.66 | 129.62 | 41.67 | 14.14 |

RSE | 51.92 | 38.29 | 3.85 | 33.70 | 9.09 | 40.43 | |

RRMSE | 66.26 | 114.24 | 60.78 | 133.92 | 42.65 | 42.79 | |

Weighted | RBias | −0.72 | −1.06 | −0.33 | 1.04 | 0.53 | 6.20 |

likelihood | RSE | 36.76 | 36.52 | 4.76 | 16.49 | 5.85 | 35.00 |

RRMSE | 36.73 | 36.50 | 4.76 | 16.51 | 5.86 | 35.51 | |

Calibrated | RBias | −0.88 | −1.21 | −0.33 | 0.78 | 0.62 | 6.30 |

weighted | RSE | 36.57 | 36.50 | 4.76 | 16.61 | 5.92 | 34.92 |

likelihood | RRMSE | 36.54 | 36.48 | 4.77 | 16.61 | 5.95 | 35.45 |

Complete | RBias | −4.57 | 0.78 | 0.05 | 1.44 | 0.32 | 13.51 |

data | RSE | 43.66 | 20.01 | 2.24 | 10.91 | 3.54 | 36.62 |

likelihood | RRMSE | 43.86 | 20.01 | 2.24 | 11.00 | 3.55 | 39.00 |

Conditional | RBias | −7.44 | 7.24 | 3.61 | 9.55 | 4.38 | 9.46 |

likelihood | RSE | 51.36 | 31.37 | 8.80 | 25.79 | 11.16 | 35.94 |

RRMSE | 51.85 | 32.16 | 9.51 | 27.48 | 11.97 | 37.13 | |

Full | RBias | 0.21 | −0.28 | −0.36 | 0.60 | 0.28 | 6.46 |

information | RSE | 8.64 | 13.99 | 2.02 | 7.77 | 2.48 | 17.49 |

likelihood | RRMSE | 8.63 | 13.98 | 2.05 | 7.78 | 2.49 | 18.63 |

Full | RBias | 0.14 | 0.35 | −0.35 | 0.09 | 0.28 | 8.40 |

data | RSE | 8.77 | 14.70 | 2.07 | 7.26 | 2.52 | 13.75 |

likelihood | RRMSE | 8.77 | 14.69 | 2.09 | 7.25 | 2.53 | 16.10 |

Empirical | RBias | 0.15 | −0.04 | −0.21 | 0.56 | 0.56 | 7.13 |

proportion | RSE | 8.67 | 13.93 | 2.00 | 7.43 | 2.58 | 15.72 |

likelihood | RRMSE | 8.66 | 13.92 | 2.01 | 7.45 | 2.64 | 17.25 |

Method | Value | a | μ x | σ x | μ b | σ b | σ ε |
---|---|---|---|---|---|---|---|

True value | 0.5 | 1.0 | 5.0 | 1.0 | 1.0 | 0.7 | |

Random | RBias | −31.90 | 43.42 | 61.46 | 82.91 | 7.43 | 55.62 |

RSE | 45.04 | 29.58 | 3.26 | 7.43 | 7.33 | 59.65 | |

RRMSE | 55.16 | 52.52 | 61.54 | 83.24 | 10.43 | 81.51 | |

Weighted | RBias | 1.98 | −1.41 | −0.22 | −0.04 | 0.43 | 11.26 |

likelihood | RSE | 36.03 | 29.21 | 3.95 | 7.36 | 5.68 | 32.18 |

RRMSE | 36.05 | 29.21 | 3.95 | 7.35 | 5.69 | 34.06 | |

Calibrated | RBias | 1.80 | −1.34 | −0.21 | −0.19 | 0.41 | 11.47 |

weighted | RSE | 36.36 | 29.04 | 3.95 | 7.48 | 5.80 | 32.11 |

likelihood | RRMSE | 36.37 | 29.04 | 3.95 | 7.47 | 5.81 | 34.06 |

Complete | RBias | −1.18 | −0.90 | 0.24 | −0.05 | 0.45 | 16.00 |

data | RSE | 40.32 | 21.51 | 2.11 | 6.21 | 3.38 | 34.70 |

likelihood | RRMSE | 40.30 | 21.50 | 2.12 | 6.21 | 3.40 | 38.18 |

Conditional | RBias | −0.57 | 3.15 | 2.35 | 3.87 | 1.44 | 13.30 |

likelihood | RSE | 45.82 | 41.49 | 5.92 | 11.87 | 5.37 | 34.77 |

RRMSE | 45.78 | 41.57 | 6.37 | 12.47 | 5.55 | 37.20 | |

Full | RBias | 1.10 | −0.60 | 0.05 | −0.70 | 0.70 | 9.32 |

information | RSE | 11.84 | 11.51 | 2.07 | 4.40 | 2.80 | 22.27 |

likelihood | RRMSE | 11.88 | 11.52 | 2.07 | 4.45 | 2.89 | 24.13 |

Full | RBias | 0.98 | −0.41 | 0.02 | −0.84 | 0.64 | 13.05 |

data | RSE | 12.32 | 12.70 | 2.02 | 3.88 | 2.76 | 16.76 |

likelihood | RRMSE | 12.35 | 12.70 | 2.02 | 3.97 | 2.83 | 21.23 |

Empirical | RBias | 1.17 | −0.55 | 0.25 | −0.52 | 0.87 | 10.02 |

proportion | RSE | 11.93 | 11.61 | 1.96 | 4.10 | 2.83 | 20.90 |

likelihood | RRMSE | 11.97 | 11.61 | 1.97 | 4.13 | 2.95 | 23.16 |

L R = ∏ i = 1 n f ( y i , x i | θ ) (26)

as a reference point to see the difference between considering BSS and totally ignoring BSS.

For the linear model with BI variation (22), Tables 1-3 indicate that the full information, full-data and EP likelihood approaches have quite close performance, and in general they perform substantially better than all the other approaches in terms of RBias, RSE and RRMSE for all estimated parameters. The weighted likelihood (WL) and calibrated WL approaches have close performance, and there is no evidence that calibration improves the estimation; that is,

Method | Value | Case 1 | Case 2 | ||||||
---|---|---|---|---|---|---|---|---|---|

l ∞ | k | a 0 | CV | l ∞ | k | a 0 | CV | ||

True value | 70 | 0.2 | −0.07 | 0.2 | 70 | 0.2 | −0.07 | 0.2 | |

Random | RBias | 86.06 | −60.83 | −614.61 | 20.35 | 88.00 | −61.04 | −23.60 | 35.79 |

RSE | 15.45 | 4.31 | 77.73 | 3.40 | 34.09 | 9.72 | 495.95 | 4.13 | |

RRMSE | 87.44 | 60.98 | 619.49 | 0.21 | 94.36 | 61.81 | 496.01 | 0.36 | |

Weighted | RBias | 9.27 | −21.41 | −564.69 | 6.38 | 2.69 | −6.53 | −525.27 | 0.64 |

likelihood | RSE | 5.26 | 6.52 | 96.69 | 3.15 | 7.47 | 20.66 | 991.37 | 2.35 |

RRMSE | 10.66 | 22.38 | 572.89 | 0.07 | 7.93 | 21.64 | 1121.06 | 0.02 | |

Calibrated | RBias | 9.27 | −21.41 | −564.69 | 6.38 | 2.69 | −6.53 | −525.27 | 0.64 |

weighted | RSE | 5.26 | 6.52 | 96.69 | 3.15 | 7.47 | 20.66 | 991.37 | 2.35 |

likelihood | RRMSE | 10.66 | 22.38 | 572.89 | 0.07 | 7.93 | 21.64 | 1121.05 | 0.02 |

Complete | RBias | 10.41 | −26.13 | −640.04 | 9.48 | 1.91 | −5.97 | −469.93 | 0.35 |

data | RSE | 5.47 | 6.47 | 88.99 | 2.55 | 5.36 | 16.71 | 754.74 | 2.23 |

likelihood | RRMSE | 11.76 | 26.92 | 646.19 | 0.10 | 5.68 | 17.73 | 888.44 | 0.02 |

Conditional | RBias | 12.82 | −31.22 | −619.49 | 28.21 | 30.96 | 54.24 | −659.63 | 31,525.17 |

likelihood | RSE | 31.78 | 17.67 | 106.90 | 14.50 | 116.24 | 1536.62 | 1001.56 | 703,564.57 |

RRMSE | 34.24 | 35.86 | 628.63 | 0.32 | 120.18 | 1536.04 | 1198.43 | 7035.67 | |

Full | RBias | 1.59 | −12.76 | −523.58 | 11.26 | 1.34 | −5.51 | −399.94 | 0.60 |

information | RSE | 2.92 | 4.33 | 64.60 | 2.44 | 3.93 | 11.79 | 543.65 | 1.75 |

likelihood | RRMSE | 3.32 | 13.47 | 527.54 | 0.12 | 4.14 | 13.01 | 674.48 | 0.02 |

Full | RBias | −2.07 | −8.44 | −513.17 | 16.92 | 1.30 | −5.20 | −396.57 | 0.56 |

data | RSE | 3.14 | 4.72 | 98.26 | 3.60 | 4.40 | 12.60 | 572.79 | 1.81 |

likelihood | RRMSE | 3.76 | 9.67 | 522.47 | 0.17 | 4.59 | 13.62 | 696.20 | 0.02 |

Empirical | RBias | 1.68 | −12.89 | −524.59 | 11.30 | 1.43 | −5.64 | −396.78 | 0.71 |

proportion | RSE | 2.92 | 4.33 | 65.55 | 2.45 | 3.95 | 11.82 | 544.41 | 1.76 |

likelihood | RRMSE | 3.37 | 13.59 | 528.66 | 0.12 | 4.20 | 13.08 | 673.22 | 0.02 |

in some cases the calibrated WL has a little smaller RRMSEs than WL, and in the other cases the reverse happens, but the differences have no clear pattern, and are too small to draw reliable conclusions. Similarly, even though there is some difference in performance between the complete-data likelihood approach and the two WL approaches, it is not clear which method performs better. The two WL approaches have smaller RRMSEs for a and σ ε estimation, while the complete-data likelihood approach has smaller RRMSEs for other parameter estimation. The conditional likelihood approach based on (6) performs the worst among all the approaches in this study except the random approach. Especially for μ x , σ x , μ b and σ b estimation, its RRMSEs are more than twice of those from the complete-data likelihood approach. Nevertheless, the conditional likelihood approach performs substantially better than the random approach.

Simulation results presented in

The simulation study indicates that the full information likelihood (1), full-data likelihood (15) and EP likelihood (20) approaches perform better than the other estimation methods. In this section we apply these three approaches to fit the VonB model (24) using a dataset collected by DFO in NAFO Division 3N during the spring of 2011. Here we consider only female American plaice because males and females follow different growth models.

The LSAS within each Division involved measuring the length of all fish caught in research trawl tows, classifying them into 2 cm length strata, and subsampling a few or no otoliths from each length stratum. The sampling goal in each Division was to obtain about 25 age measurements per 2 cm length stratum by sex if length ≥ 10 cm , and about 15 age measurements per stratum without sex distinguishment if length < 10 cm .

Parameter estimates (ESTs) and the corresponding standard errors (SEs) are provided in

Applying (17), we obtained the standardized residuals of the second phase complete data for all approaches, whose box-and-whisker plots by age are shown in

Method | Value | l ∞ | k | a 0 | CV |
---|---|---|---|---|---|

Full information | EST | 61.86 | 0.10 | −0.51 | 0.11 |

likelihood | SE | 1.74 | 0.0056 | 0.14 | 0.0037 |

EP | EST | 62.21 | 0.10 | −0.49 | 0.11 |

likelihood | SE | 1.77 | 0.0056 | 0.13 | 0.0037 |

Full-data | EST | 65.05 | 0.093 | −0.75 | 0.11 |

likelihood | SE | 1.78 | 0.0048 | 0.13 | 0.0037 |

Random | EST | 84.20 | 0.065 | −0.82 | 0.11 |

SE | 5.53 | 0.0072 | 0.17 | 0.0039 |

likelihood and full-data likelihood approaches do not indicate bias in fitted mean length at age from the data mean along the full range of age. The standardized residuals from the random approach (26) exhibit clear bias to negative values at ages larger than about 12, indicating over-estimation of l ∞ . In

We derived the density function (11) for BSS (basic stratified sampling) complete data, and constructed the complete-data likelihood (12), which allows statistical inference when the incomplete data are not well retained. The complete-data density can also be used for standardized residual calculation as discussed in Section 3. Residuals are important for validation of fitted models.

Both the complete-data likelihood approach and the random approach make use of only the complete data. The complete approach performs substantially better than the random approach in the simulation studies, indicating the importance of correctly incorporating the sampling scheme in the inference methods. The conditional likelihood (6) accounts for the sampling scheme approximately by ignoring the randomness in n h in all the strata. Therefore its performance lies between the random and the complete-data likelihood approaches in almost all the cases in the simulation study. However in some BSS sampling projects where the number of strata is small and the maximum subsample size m h for each stratum can usually be obtained, then the conditional likelihood (6) is appropriate.

Another method to incorporate the sampling scheme is to use the count information of the incomplete data in each stratum, as in the weighted likelihood (WL) and calibrated WL approaches. Even though in the simulation study the two methods of accounting for the sampling scheme, namely the complete-data likelihood and the (calibrated) WL approaches, have comparable performance, the complete-data likelihood requires an appropriate distribution model for covariates, which can limit its application. The WL and calibrated WL approaches are not subject to this restriction, and hence can be more practical.

A full utilization of the information in incomplete data is to incorporate the density function of the incomplete data in the likelihood. In this regard, we proposed two new likelihoods for BSS, namely, the full-data likelihood and the empirical proportion (EP) likelihood. If the covariate distribution can be properly modeled, the two new approaches perform as well as the standard full information likelihood approach, and they all perform substantially better than the other methods covered in this study. This result suggests the significance of the information in the incomplete data.

On the whole this study indicates that the complete data, the incomplete data, and the sampling scheme are all important for a consistent and efficient statistical inference from BSS data.

In this work we found that the EP likelihood approach, which was originally proposed for the variable probability sampling (VPS), works well (or the best together with the full-data and full information likelihood approaches) for BSS data. Its merits will further show up when covariates cannot be modeled effectively. This work is under the condition that a valid covariate distribution model is available, which may be a strong assumption in practice. We will explore the case when no appropriate covariate distribution model is available in another paper.

Research funding to Nan Zheng was provided by the Ocean Frontier Institute, through an award from the Canada First Research Excellence Fund. Funding was also provided by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grant to NC and NC’s Ocean Choice International Industry Research Chair program at the Marine Institute of Memorial University of Newfoundland.

The authors declare no conflicts of interest regarding the publication of this paper.

Zheng, N. and Cadigan, N. (2019) Likelihood Methods for Basic Stratified Sampling, with Application to Von Bertalanffy Growth Model Estimation. Open Journal of Statistics, 9, 623-642. https://doi.org/10.4236/ojs.2019.96040

Without loss of generality, we assume that ( y , x ) ∈ S h , then

f ( y , x | R = 1 ; θ ) = Pr ( ( y , x ) ∈ S h | R = 1 ; θ ) Pr ( y , x | ( y , x ) ∈ S h , R = 1 ; θ ) .

Since the selection for full observation is random given ( y , x ) ∈ S h ,

Pr ( y , x | ( y , x ) ∈ S h , R = 1 ; θ ) = Pr ( y , x | ( y , x ) ∈ S h ; θ ) = f ( y , x | θ ) Q h ,

and we have

f ( y , x | R = 1 ; θ ) = Pr ( ( y , x ) ∈ S h | R = 1 ; θ ) f ( y , x | θ ) Q h = ∑ n h = 0 m h Pr ( n h | R = 1 ; θ ) Pr ( ( y , x ) ∈ S h | n h , R = 1 ; θ ) f ( y , x | θ ) Q h , (27)

where n h is the sample size in the hth stratum as defined by (4).

Pr ( ( y , x ) ∈ S h | n h , R = 1 ; θ ) ∝ n h , that is, the probability for a selected unit to be in a stratum h is proportional to the number of vacancies in the stratum h. Also, Pr ( n h | R = 1 ; θ ) = Pr ( n h | θ ) , namely, the event {a unit is selected without any further information about its ( y , x ) } is independent of the event {there are n h units that are selected in the stratum h}.

Pr ( n h | θ ) = ( dbin ( N h , N , Q h ) , if N h < m h and hence n h = N h , 1 − pbin ( m h − 1, N , Q h ) , if N h ≥ m h and hence n h = m h .

Hence, when ( y , x ) ∈ S h ,

f ( y , x | R = 1 ; θ ) ∝ f ( y , x | θ ) Q h × { [ ∑ N h = 1 m h − 1 N h dbin ( N h , N , Q h ) ] + m h [ 1 − pbin ( m h − 1, N , Q h ) ] } ,

which can be normalized into (11).

Note that in the case Pr ( n h = m h | θ ) = 1 for all the strata h = 1 , ⋯ , H , Pr ( ( y , x ) ∈ S h | m h , R = 1 ; θ ) = m h / ∑ h = 1 H m h , which is a constant independent of θ . Then (27) leads to f ( y , x | R = 1 ; θ ) ∝ f ( y , x | θ ) / Q h , which proved (5).