_{1}

Quadratic distance methods based on a special distance which make use of survival functions are developed for inferences for bivariate continuous models using selected points on the nonnegative quadrant. A related version which can be viewed as a simulated version is also developed and appears to be suitable for bivariate distributions with no closed form expressions and numerically not tractable but it is easy to simulate from these distributions. The notion of an adaptive basis is introduced and the estimators can be viewed as quasilikelihood estimators using the projected score functions on an adaptive basis and they are closely related to minimum chi-square estimators with random cells which can also be viewed as quasilikeliood estimators using a projected score functions on a special adaptive basis but the elements of such a basis were linearly dependent. A rule for selecting points on the nonnegative quadrant which make use of quasi Monte Carlo (QMC) numbers and two sample quantiles of the two marginal distributions is proposed if complete data is available and like minimum chi-square methods ; the quadratic distance methods also offer chi-square statistics which appear to be useful in practice for model testing.

In actuarial science or biostatistics, we often encounter bivariate data which are already grouped into cells forming a contingency table and we would like to make inferences for a continuous bivariate model used to model the complete data, see Partrat [

The bivariate distributions if they have closed form expressions then there is no difficulty in general to fit these distributions using maximum likelihood or minimum chi-square methods based on grouped data for examples but many useful distributions might only be computable numerically as they are expressible only using an integral representation and if the quadrature numerical methods often fail then it appears to be natural to develop simulated methods of inferences for these distributions. We would like to have methods which offer a unified approach to estimation and model testing as well beside they should be able to handle the situation where the lack of closed form expressions for the model survival distributions might create numerical difficulties. We shall see subsequently that new distributions created using the bivariate survival mixture operator (BSPM) introduced by Marshall and Olkin [

Marshall and Olkin [

S ( x , y ) = ∫ 0 ∞ ( F 1 ¯ ( x ) ) θ ( F 2 ¯ ( y ) ) θ d G ( θ ) .

Since there is an integral representation, the new survival function might still be computable numerically depending on the expressions for F 1 ¯ ( x ) , F 2 ¯ ( y ) and G ( θ ) .An algorithm to simulate a sample from S ( x , y ) has also been given by Marshall and Olkin [

Later on in section (1.2) we shall examine another way to create new survival functions. Unlike new distributions created using the BSPM operator, new distributions created using means of trivariate reduction techniques often do not even have an integral representation despite the functions used are simple for examples linear functions. We shall discuss further trivariate reduction techniques in section 1.2 and consider first a few examples of new distributions cretead using the BSPM operator subsequently.

Example 1

We let F 1 ¯ ( x ; α 1 , λ 1 ) = e − λ 1 x α 1 , x > 0 , λ 1 , α 1 > 0 which is the survival function of a Weibull distribution and similarly let F 2 ¯ ( y ; α 2 , λ 2 ) = e − λ 2 x α 2 , y > 0 , λ 2 , α 2 > 0 . For the mixing random variable θ let θ follows a Pareto type II distribution which is also called Lomax distribution with density function given by

f ( θ ; τ , δ ) = δ τ δ ( θ + τ ) δ + 1 with the domain given by θ > 0 and the parameters α and δ are positive. Note that θ has no closed form Laplace transform (LT).

The new bivariate distribution created using the BSPM operator is

S β ( x , y ) = ∫ 0 ∞ e − θ λ 1 x α 1 e − θ λ 2 x α 2 δ τ δ ( θ + τ ) δ + 1 d θ , β = ( α 1 , α 2 , λ 1 , λ 2 , δ , τ ) ′ .

For most of the univariate distributions used in the paper, see Appendix A given by Klugman et al. [

Observe that if we specify a Gamma distribution for θ instead of a Lomax distribution as discussed earlier where the density function of θ is given by

f ( θ ; τ , δ ) = 1 Γ ( δ ) θ δ − 1 e − θ , θ > 0

and its Laplace transform is given by ф ( s ) = ( 1 + s ) − δ , the newly created bivariate survival distribution can be expressed as

S β ( x , y ) = ∫ 0 ∞ e − θ λ 1 x α 1 e − θ λ 2 x α 2 1 Γ ( δ ) θ δ − 1 e − θ d θ

and since ф ( s ) has a closed form expression S β ( x , y ) has closed form expression which is given by

S β ( x , y ) = ф ( H 1 ( x ; α 1 , λ 1 ) + H 2 ( y ; α 2 , λ 2 ) )

where H 1 ( x ; α 1 , λ 1 ) and H 2 ( y ; α 2 , λ 2 ) are respectively the cumulative hazard rate functions of F 1 ¯ ( x ; α 1 , λ 1 ) and F 2 ¯ ( y ; α 2 , λ 2 ) with

H 1 ( x , α 1 , λ 1 ) = − ln F 1 ¯ ( x ; α 1 , λ 1 ) = λ 1 x α 1 ,

H 2 ( y , α 2 , λ 2 ) = − ln F 2 ¯ ( y ; α 2 , λ 2 ) = λ 2 y α 2 .

By using the usual conditioning argument, by conditioning on θ often we can obtain the first two moments of the vector Z = ( X , Y ) ′ and even higher positive integer moments can be obtained without having a closed form for S β ( x , y ) ; see the conditioning argument for the univariate case given by Klugman et al. [

In this paper we emphasize grouped data. In general, with grouped data MM estimators cannot be obtained and furthermore with four or five parameters in the bivariate model, high order moments must be used and as a result the MM estimators are not robust in general. We consider the situation where the data have been grouped into a contingency table so that we must analyse data in this from or the complete data is available but we must group them to perform chi-square tests for model testing. If complete data are available then we have choices to group the data; in this situation we hope to be able to propose a way to group them so that inference methods based on such a grouping rule will have have high efficiencies, see the discussions in section (5) by Klugman and Parsa [

Z i = ( X i , Y i ) ′ , i = 1 , ⋯ , n which are independent and identically distributed(iid)from a bivariate distribution specified by a bivariate survival function S β ( x , y ) ; this includes a situation where the original observations have been left truncated by d 1 and d 2 where the values d 1 and d 2 are known; d 1 and d 2 are the amount of deductibles in actuarial science for example. We can view

S β ( x , y ) = S β o ( x , y ) S β o ( d 1 , d 2 ) and we only need S β ( x , y ) for fitting with S β o ( x , y )

being specified as well; see Klugman and Parsa [

S β ( x , y ) = 1 − F β ( x ) − G β ( y ) + K β ( x , y ) ,

F β ( x ) and G β ( y ) are the two marginal distributions of K β ( x , y ) .For nonnegative parametric families where the bivariate distribution functions are commonly used for specifying the families, it is not difficult to convert them to bivariate survival functions and consequently, MQD methods are still applicable; only some minor modifications are needed.

Example 2

In this example, we let F 1 ¯ and F 2 ¯ to be Burr survival functions, see Hogg et al. [

F 1 ¯ ( x ; α 1 , λ 1 , γ 1 ) = ( 1 + δ 1 x τ 1 ) − γ 1 , x > 0

and the parameters δ 1 , τ 1 , γ 1 are positive and similarly, let

F 2 ¯ ( y ; α 2 , λ 2 , γ 2 ) = ( 1 + δ 2 y τ 2 ) − γ 2 .

For the distribution of θ, we specify a Weibull distribution with density function given by

f ( θ ; λ , α ) = α λ θ α − 1 e − λ θ α , θ > 0.

The new distribution created using the BSPM operator will have bivariate survival function given by

S β ( x , y ) = ∫ 0 ∞ ( 1 + δ 1 x τ 1 ) − θ γ 1 ( 1 + δ 2 y τ 2 ) − θ γ 2 ⋅ α λ θ α − 1 e − λ θ α d θ .

The bivariate survival function has no closed form expression. For bivariate distributions without a closed form expression for their survival functions and only representable using an integral representation on an unbounded domain, S β ( x , y ) can be evaluated numerically or not depends on the integrand and the numerical quadrature method used. In the same vein, we can mention the class of bivariate contingency tables studied by Mardia [

For S β ( x , y ) which is not numerical tractable we propose simulated minimum quadratic distance (SMQD) methods and providing that we can draw simulated samples from S β ( x , y ) , estimation of β is still possible and minimum quadratic distance methods offer a unified approach for S β ( x , y ) with a closed form expression or without a closed form expression for grouped data without choice and with choices. MQD methods version D will be suitable for numerical tractable S β ( x , y ) and a corresponding simulated version, version S will be suitable if S β ( x , y ) is not numerically tractable. For version S, S β ( x , y ) will be replaced by a sample bivariate survival distribution S β s ( x , y ) based on a simulated sample of size U = τ n , τ ≥ 10 drawn from S β ( x , y ) with S β s ( x , y ) converges in probability to S β ( x , y ) for each point z = ( x , y ) ′ fixed,

i.e., S β s ( x , y ) → p S β ( x , y ) ,

S β s ( x , y ) is defined similarly as S n ( x , y ) , see expression (1).

Data under the form of contingency tables are often encountered in actuarial science and biostatistics where bivariate observations are grouped into a two dimensions array or matrix and only the numbers of observations or proportions of the original sample which belong to the elements of such a matrix are recorded. The original data set is lost. Obviously, when the original complete data set is available, we can always group them into a contingency table but once grouped it is impossible to convert grouped data to complete data. We focus on the situations where the complete data are observations Z 1 , ⋯ , Z n which are independent and identically distributed as Z = ( X , Y ) ′ which follows a bivariate absolutely continuous distribution with domain given by the nonnegative quadrant and subsequently they are grouped and we develop statistical inference techniques using grouped data.

A version of bivariate gamma distribution as introduced by Mathai and Moschopolus [

Let V 0 , V 1 , V 2 be gamma random variables with their respective density functions given by

f ( v i ; α i , β i ) = 1 Γ ( α i ) β i α i v i α i − 1 e − v i β i , v i ≥ 0 , α i , β i ≥ 0

and

X = ф 1 ( V 0 , V 1 , V 2 ) = β 1 β 0 V 0 + V 1 ,

Y = ф 2 ( V 0 , V 1 , V 2 ) = β 1 β 0 V 0 + V 2 .

The bivariate density function f ( x , y ; β ) has no closed form expression and very complicated. It has five parameters which can be represented by the vector β = ( α 0 , α 1 , α 2 , β 1 , β 2 ) ′ , see section (5) as given by Mathai and Moschopoulos [

Contingency table data can be viewed as a special form of two -dimensional grouped data. We will give some more details about this form of grouped data.Assume that we have a sample Z i = ( X i , Y i ) ′ , i = 1 , ⋯ , n which are independent and identically distributed as Z = ( X , Y ) ′ which follows a non-negative continuous bivariate distribution with model survival function given by S β ( x , y ) .

The vector of parameters is β = ( β 1 , ⋯ , β m ) ′ , the true vector of parameters is denoted by β 0 . We do not observe the original sample but observations are grouped and put into a contingency table and only the number which fall into each cells of the contingency table are recorded or equivalently the sample proportions which fall into these cells are recorded. The grouping in two dimensional cells generalize the grouping of univariate into disjoint intervals of the nonnegative real line in one dimension. Contingency tables data are often encountered in actuarial science and biostatistics, see Partrat [

Let the nonnegative axis X be partitioned into disjoints interval ∪ i = 1 J [ s i − 1 , s i ) with s 0 = 0 , s J = ∞ and similarly, the axis Y be partitioned into disjoints interval ∪ j = 0 K [ t j − 1 , t j ) with t 0 = 0 , t K = ∞ .

The nonnegative quadrant can be partitioned into nonoverlapping cells of the form

C i j = [ s i − 1 , s i ) × [ t j − 1 , t j ) , i = 1 , ⋯ , I , j = 1 , ⋯ , J .

The contingency table T = ( C i j ) is formed which can be viewed as a matrix with elements given by

C i j , i = 1 , ⋯ , I , j = 1 , ⋯ , J

We can define the empirical bivariate survival function as

S n ( x , y ) = 1 n ∑ i = 1 n I [ X i > x , Y i > y ] (1)

and we have S n ( x , y ) → p S β 0 ( x , y ) .

The sample proportion or empirical probability for one observation which falls into cell C i j can be obtained using S n ( x , y )

p n ( C i j ) = S n ( s i − 1 , t j − 1 ) − S n ( s i − 1 , t j ) − S n ( s i , t j − 1 ) + S n ( s j , t j ) (2)

and the corresponding model probability is

p β ( C i j ) = S β ( s i − 1 , t j − 1 ) − S β ( s i − 1 , t j ) − S β ( s i , t j − 1 ) + S β ( s j , t j ) . (3)

Note that

S β ( s i , t J ) = 0 , S n ( s i , t J ) = 0 , i = 1 , ⋯ , I (4)

and similarly,

S β ( s I , t j ) = 0 , S n ( s I , t j ) = 0 , j = 1 , ⋯ , J . (5)

so they can be discarded without affecting the efficiency of inference methods and this is precisely the approach quadratic distance methods use by discarding redundant elements and create a basis with only linearly independent elements but the basis will span the same linear space. Consequently, we gain in numerical effciciency and at the same time retaining the same efficiency.

Using the contingency table data, the modified minimum chi-square estimators which are as efficient as the likelihood estimators using the grouped data is obtained by minimizing the objective function given by

∑ i , j ( p n ( C i j ) − p β ( C i j ) ) 2 p n ( C i j )

and since p n ( C i j ) → p p β 0 ( C i j ) , the minimum chi-square estimators given by the vector β ˜ which minimizes the expression given above have the same asymptotic efficiency as the vector β ∗ which minimize ∑ i , j ( p n ( C i j ) − p β ( C i j ) ) 2 p β 0 ( C i j ) and under differentiability assumptions given by the roots of the system of equation

∑ i , j ( p n ( C i j ) − p β ( C i j ) ) p β 0 ( C i j ) ∂ p β ( C i j ) ∂ β = 0 .

Note that the quasi-score functions generated belong to the linear space spanned by

{ p n ( C i j ) − p β ( C i j ) , i = 1 , ⋯ , I , j = 1 , ⋯ , J } . (6)

But since we have the property given by expressions (4-5), the same linear space is spanned by

{ S n ( s i j , t i j ) − S β ( s i j , t i j ) , i = 1 , ⋯ , I − 1 , j = 1 , ⋯ , J − 1 } . (7)

Therefore, an equivalent method but possibly numerically more efficient, remember we need to evaluate p β ( C i j ) numerically or by simulation is to minimize a quadratic form using the elements of the basis given by expression (7) with an optimum matrix which is no longer diagonal as in the minimum chi-square objective function but it turns out to be quite simple and can be estimated empirically as S n ( s i j , t i j ) are relatively simple and well defined for i = 1 , ⋯ , I − 1 , j = 1 , ⋯ , J − 1 .Furthermore, for performing minimum chi square methods in practice if the cells do not have more than 5 elements, they need to be regrouped into larger cells, this will reduce the efficiency of the minimum chi-square methods in practice. We do not need as many regrouping operations with the proposed methods which are quadratic distance methods. For the equivalent efficiency of the new proposed methods, see the projection argument in Luong [

Also using expression (7) is equivalent to use overlapping cells of the form

O i j = I [ x > s i j , y > t i j ] , i = 1 , ⋯ , I − 1 , j = 1 , ⋯ , J − 1 . (8)

The objective function of the proposed quadratic form will be given below. It is a natural extension of the objective function used in the univariate case. Define a vector with components being elements of the basis so that we only need one subscript by collapsing the matrix given by expression (7) into a vector by putting the first row of the matrix as the first batch of elements of the vector and the second row being the second batch of elements so forth so on, i.e., let

z n = ( S n ( s 1 , t 1 ) , ⋯ , S n ( s M , t M ) ) ′ , M = ( I − 1 ) ( J − 1 ) . (9)

and its model counterpart is

z β = ( S β ( s 1 , t 1 ) , ⋯ , S β ( s M , t M ) ) ′ . (10)

The number of components of z n is M with the assumption M > m .

Efficient quadratic distance methods can be constructed using the inverse of the covariance matrix of z n as the weight matrix for the quadratic form but such an optimum matrix is only defined up to a constant, so the inverse of the matrix of the following vector

h ( x , y ) = ( I [ x > s 1 , y > t 1 ] − S β ( s 1 , t 1 ) , ⋯ , I [ x > s M , y > t M ] − S β ( s M , t M ) ) ′ . (11)

can also be considered as optimum. It can also be replaced by a consistent estimate and since the elements of the basis can be identified as the corresponding

elements of the vector 1 n ∑ i = 1 n h ( x i , y i ) with the observations being independent and identically distributed, choosing a basis is equivalent to choose the vector h ( x , y ) .

It is not difficult to see that the infinite basis of the form

{ I [ x > s l , y > t l ] − S β ( s l , t l ) , l = 1 , 2 , ⋯ }

is complete and the projected score functions will give estimators which have the same efficiency as the maximum likelihood estimators as the score functions belong to the space spanned by the infinite basis, see Carrasco and Florens [

Beside the predetermined grouped scenario, we shall also examine the question: if we have complete data and we would like to choose a finite basis with M elements and since these elements are identified as points, the same question can be phrased as how we should choose M points or equivalently, how we should group the data into cells?

The question on how to choose cells appear to be already difficult for the minimum chi square methods with univariate data, see Greenwood and Nikulin [

We shall discuss on how to construct such an adaptive basis when complete data is available in section (3) where we would like to give some light on the question on how to form a finite basis so that inference methods using the basis will have high efficiencies for a restricted parameter space which appear to be reasonable for the applications. It appears that such an adaptive basis which is formed from a complete basis is very practical for applications. Using an infinite basis such as in the case of generalized method of moments (GMM) with univariate observations based on a continuum moment conditions as introduced by Carrasco and Florens [

Using the remarks given by Luong ( [

In general, it is not obvious on how to construct an adaptive basis from a complete basis, for example it is a difficult task to construct an adaptive basis from the complete basis of polynomials { x j y k − E β ( x j y k ) , j = 0 , 1 , ⋯ , k = 0 , 1 , ⋯ } so that the MQD methods using such a basis might have good efficiencies. Howewer, it is natural to construct an adaptive basis with only finite number of elements extracted from a complete basis as given by expression (11). This will be further developed in section (3) of the paper. The notion of an adaptive basis constructed from a complete basis used to project the score functions appears to be relatively new despite implicitly it has been used in the minimum chi-square methods without using explicitly this notion, see Moore and Spruill [

In the literature, attentions seem to be given to complete bases. Adaptive bases will be further developed and discussed in section (3) and used to develop MQD methods with a deterministic version (version D) and a simulated version (version S) or SMQD methods.

For the deterministic version (version D) with S β ( u , v ) considered to be fixed, asymptotic properties of the methods are similar to the univariate case as given Luong and Thompson [

It might be worth to mention in practice without an infinite and complete basis and only using a finite basis which is a subset of an infinite basis high efficiency for the procedures can only be attained in some restricted parameter space in general unless the score functions belong to the span of the finite base. One viable strategy is to identify a restricted parameter space for the type of applications being considered; often it suffices to use a restricted parameter space then try to identify elements to form a finite basis so that the procedures will retain high efficiency overall for β 0 which belongs the restricted parameter space, see Luong [

For MQD or SMQD methods, the matrix Ω 0 which is the covariance matrix of the vector h ( x , y ) under β 0 plays an important role as we can obtain estimators with good efficiencies for estimators using Ω 0 . Despite that Ω 0 is unknown, its elements are not complicated and moreover, it can be replaced by a consistent estimate constructed empirically without affecting the asymptotic efficiency of the procedures. It is also needed for constructing chi-square tests statistics. We shall give more details about this matrix and construct an empirical estimate Ω 0 ^ which is data dependent for Ω 0 .

Let Ω 0 the covariance matrix of the vector h ( x , y ) under β 0 , its elements are given by

Ω 0 ( i , j ) = c o v ( I [ x > s i , y > t i ] , I [ x > s j , y > t j ] ) = E ( ( I [ x > s i , y > t i ] ) ( I [ x > s j , y > t j ] ) ) − ( E ( I [ x > s i , y > t i ] ) ) ( E ( I [ x > s i , y > t i ] ) ) = S β 0 ( max ( s i , s j ) , max ( t i , t j ) ) − ( S β 0 ( s i , t i ) ) ( S β 0 ( s j , t j ) ) , i = 1 , ⋯ , M , j = 1 , ⋯ , M (12)

Clearly, these elements can be estimated empirically with the bivariate empirical survival function using grouped data provided by the contingency table; we then have the corresponding estimates given by

Ω 0 ^ ( i , j ) = S n ( max ( s i , s j ) , max ( t i , t j ) ) − ( S n ( s i , t i ) ) ( S n ( s j , t j ) ) , i = 1 , ⋯ , M , j = 1 , ⋯ , M . (13)

Therefore, we can define the matrix Ω 0 ^ and its inverse is denoted by W 0 ^ and similarly let W 0 be the inverse of Ω 0 . Clearly, W 0 ^ → p W 0 as S n ( x , y ) → p S β 0 ( x , y ) . Now, we can define the objective functions to be minimized for the implementations of MQD and SMQD methods.

For version D, let

G n ( β ) = ( z n − z β ) ′ (14)

and let S β s ( x , y ) be an estimate of S β ( x , y ) using a simulated sample of size U = τ n so that we can define

z β s = ( S β s ( s 1 , t 1 ) , ⋯ , S β s ( s M , t M ) ) ′

and let

G n ( β ) = ( z n − z β s ) ′ (15)

for version S.

We can define the length of the random function G n ( β ) as ‖ G n ( β ) ‖ with the norm ‖ . ‖ defined as

‖ G n ( β ) ‖ 2 = ( z n − z β ) ′ W 0 ^ ( z n − z β ) (16)

or equivalently

‖ G n ( β ) ‖ 2 = ( z n − z β ) ′ W 0 ( z n − z β ) (17)

as they will give asymptotic equivalent estimators and goodness-of-fit tests statistics; for finding estimators numerically, we need to minimize expression (16) and for asymptotic properties it might be slightly simpler to work with expression (17) as less notations are involved. Similarly, for version S let

Q n ( β ) = ‖ G n ( β ) ‖ 2 = ( z n − z β s ) ′ W 0 ^ ( z n − z β s ) (18)

or equivalently

Q n ( β ) = ‖ G n ( β ) ‖ 2 = ( z n − z β s ) ′ W 0 ( z n − z β s ) . (19)

Note that the weight matrix W 0 ^ is the same for both versions and the norm ‖ . ‖ is a weighted Euclidean norm which obeys the triangle inequality so that results of Theorems given by Pakes and Pollard despite they are stated with the Euclidean norm remain valid if the Euclidean norm is replaced by a weighted Euclidean norm. Minimum QD estimators are obtained as the vector β ^ which minimizes the objective function Q n ( β ) or equivalently ‖ G n ( β ) ‖ as defined by expression (16) for version D or β ^ S which minimizes Q n ( β ) or equivalently ‖ G n ( β ) ‖ as defined by expression (18) for version S.

The paper is organized as follows.

In Section 2, MQD and SMQD methods will be developed using predetermined grouped data. Asymptotic properties of the estimators are studied and asymptotic distribution for the model testing statistics are derived. The methods can be extended to the situation when comple data is available but will be grouped by defining a rule to choose points on the nonnegative quadrant to group the data. An artificial sample constructed using two sample quantiles and QMC numbers are proposed in Section 3 to select points and the methods developed with preselected points or cells in section (2) are shown to be still applicable, the methods can be seen as equivalent to minimum chi-square methods with random cells but with a rule to define these cells; using random cells is equivalent to using an adaptive basis. Both QD estimation and minimum chi-square estimation can be unified with the approach of quasilikelihood estimation using an adaptive basis and to implement MQD or SMQD methods require less computing time than the related minimum chi-square versions due to the adaptive basis being used by MQD and SMQD methods only has linearly independent elements and there is less numerical evaluations or simulations for computing probabilities assigned to points than cells. Section 4 illustrates the implementations of SMQD methods by comparing the methods of moment estimators (MM) with the SMQD estimators. The SMQD estimators appear to be much more efficient and robust than MM estimators in a limited study for a bivariate gamma model with the range of parameters often encountered in actuarial science and chosen in the study.

Consistency for both versions of quadratic distance estimators using predetermined grouped data can be treated in a unified way using the following Theorem 1 which is essentially Theorem 3.1 of Pakes and Pollard [

Theorem 1 (Consistency)

Under the following conditions β ˜ converges in probability to β 0 :

1) ‖ G n ( β ˜ ) ‖ ≤ o p ( 1 ) + inf β ∈ Ω ( ‖ G n ( β ) ‖ ) , the parameter space space Ω is compact

2) ‖ G n ( β 0 ) ‖ = o p ( 1 ) ,

3) sup ‖ β − β 0 ‖ > δ ( 1 ‖ G n ( β ) ‖ ) = O p ( 1 ) for each δ > 0 .

Theorem 3.1 states condition 2) as G n ( β 0 ) = o p ( 1 ) but in the proof the authors just use ‖ G n ( β 0 ) ‖ = o p ( 1 ) so we state condition b) as ‖ G n ( β 0 ) ‖ = o p ( 1 ) .

An expression is o p ( 1 ) if it converges to 0 in probability, O p ( 1 ) if it is bounded in probability and o p ( n − 1 2 ) if it converges to 0 in probability faster than n − 1 2 → 0 . For version D and version S, we have inf θ ∈ Ω ( ‖ G n ( β ) ‖ ) occurs at the values of the vector values of the MQD estimators, so the conditions a) and b) are satisfied for both versions and compactness of the parameter space Ω is assumed. Also, for both versions ‖ G n ( β ) ‖ → p 0 only at β = β 0 in general if the number of components of G n ( β ) is greater than the number of parameters of the model, i.e., M > m . For β ≠ β 0 we have 0 < Q n ( β ) ≤ B for some B > 0 since survival functions evaluated at points are components of G n ( β ) and these functions are bounded.

This implies that there exist real numbers u and v with 0 < u < v < ∞ such that

P ( u ≤ sup ‖ β − β 0 ‖ > δ ( 1 ‖ G n ( β ) ‖ ) ≤ v ) → 1 as n → ∞ .

Therefore, for both versions of Q n ( β ) whether deterministic or simulated, the minimum quadratic distance estimators MQD and SMQD estimators are consistent using Theorem 1, i.e., the vector of MQD estimators and the vector MSQD estimators converge in probability to β_{0}. Theorem 3.1 of Pakes and Pollard [

For both versions we can express Q n ( β ) = ( ‖ G n ( β ) ‖ ) 2 , G n ( β ) is as given by expression (14) for version D and as given by expression(15) for version S and since G n ( β ) not differentiable for version S, the traditional Taylor expansion argument cannot be used to establish asymptotic normality of estimators obtained by minimizing ( ‖ G n ( β ) ‖ ) 2 .

For both versions, G n ( β ) → p G ( β ) with

G ( β ) = ( z β 0 − z β ) ′ . (20)

Explicitly,

G ( β ) = ( S β 0 ( s 1 , t 1 ) − S β ( s 1 , t 1 ) , ⋯ , S β 0 ( s M , t M ) − S β ( s M , t M ) ) ′ . (21)

The points ( s 1 , t 1 ) ′ , ⋯ , ( s M , t M ) ′ are predetermined by a contingency table we are given and we have no choice but to analyze the grouped data as they are presented.

Note that G ( β ) is non-random and if we assume G ( β ) is differentiable with derivative matrix Γ ( β ) , then we can define the random function Q n a ( β ) to approximate Q n ( β ) for both versions in a unified way with

Q n a ( β ) = ( ‖ L n ( β ) ‖ ) 2 , L n ( β ) = G n ( β 0 ) + Γ ( β 0 ) ( β − β 0 ) . (22)

The matrix Γ ( β ) can be displayed explicitly as

Γ ( β ) = − [ ∂ S β ( s 1 , t 1 ) ∂ β 1 ⋯ ∂ S β ( s 1 , t 1 ) ∂ β m ⋮ ⋱ ⋮ ∂ S β ( s M , t M ) ∂ β 1 ⋯ ∂ S β ( s M , t M ) ∂ β m ] . (23)

Note that Q n a ( β ) is differentiable for both versions. Since Q n a ( β ) is a quadratic function of β , the vector β * which minimizes Q n a ( β ) can be obtained explicitly and

β * − β 0 = − ( Γ ′ W 0 ^ Γ ) − 1 Γ ′ W 0 ^ G n (β0)

and since W 0 ^ → p W 0 , W 0 is assumed to be a positive matrix, we have

n ( β * − β 0 ) = − ( Γ ′ W 0 ^ Γ ) − 1 Γ ′ W 0 ^ n G n ( β 0 ) = − ( Γ ′ W 0 Γ ) − 1 Γ ′ W 0 n G n ( β 0 ) + o p (1)

Let β ˜ and β * be the vectors which minimize Q n ( β ) and Q n a ( β ) respectively. If the approximation is of the right order then β ˜ and β * are asymptotically equivalent. This set up will allow a unified approach for establishing asymptotic normality for both versions. For version D, it suffices to let β ˜ = β ^ and for version S, let β ˜ = β S ^ .

Clearly the set up fits into the scopes of their Theorem (3.3) which we shall rearrange the results of these two theorems before applying to version D and version S for MQD methods and verify that we can satisfy the regularity conditions of these two Theorems. We shall state Theorem 2 and Corollary 1 which are essentially their Theorem (3.3) and the proofs have been given by Pakes and Pollard [

Also, for version S, the simulated samples are assumed to have size U = τ n and the same seed is used across different values of β to draw samples of size U. We make these assumptions and they are standard assumptions for simulated methods of inferences, see section 9.6 for method of simulated moments (MSM) given by Davidson and McKinnon [

Theorem 2.

Let β ˜ be a vector of consistent estimators for β 0 ,the unique vector which satisfies G ( β 0 ) = 0 .

Under the following conditions:

1) The parameter space Ω is compact, β ˜ is an interior point of Ω.

2) ‖ G n ( β ˜ ) ‖ ≤ o p ( n − 1 2 ) + inf β ∈ Ω ‖ G n ( β ) ‖

3) G ( . ) is differentiable at β 0 with a derivative matrix Γ = Γ ( β 0 ) of full rank

4) sup ‖ β − β 0 ‖ ≤ δ n n ‖ G n ( β ) − G ( β ) − G n ( β 0 ) ‖ = o p ( 1 ) for every sequence { δ n } of positive numbers which converge to zero.

5) ‖ G n ( β ) ‖ = o p ( 1 ) .

6) β 0 is an interior point of the parameter space Ω.

Then, we have the following representation which will give the asymptotic distribution of β ˜ in Corollary 1, i.e.,

n ( β ˜ − β 0 ) = − ( Γ ′ W 0 ^ Γ ) − 1 Γ ′ W 0 ^ n G n ( β 0 ) + o p ( 1 ) , (24)

or equivalently, using equality in distribution,

n ( β ˜ − β 0 ) = d − ( Γ ′ W 0 Γ ) − 1 n Γ ′ W 0 G n ( β 0 ) (25)

or equivalently,

n ( β ˜ − β 0 ) = d − ( Γ ′ W 0 ^ Γ ) − 1 n Γ ′ W 0 ^ G n ( β 0 ) (26)

The proofs of these results follows from the results used to prove Theorem 3.3 given by Pakes and Pollard [

Corollary 1.

Let Y n = n Γ ′ W 0 G n ( β 0 ) , if Y n → L N ( 0 , V ) then n ( β ˜ − β 0 ) → L N ( 0 , T ) with

T = ( Γ ′ W 0 Γ ) − 1 V ( Γ ′ W 0 Γ ) − 1 , (27)

The matrices T and V depend on β 0 , we also adopt the notations T = T ( β 0 ) , V = V ( β 0 ) .

We observe that condition 4) of Theorem 2 when applies to SMQD methods in general involve technicalities. The condition 4) holds for version D, we only need to verifiy for version S. Note that to verify the condition 4, it is equivalent to verify

sup ‖ β − β 0 ‖ ≤ δ n n ( ‖ G n ( β ) − G ( β ) − G n ( β 0 ) ‖ ) 2 = o p ( 1 ) ,

a regularity condition for the approximation is of the right order which implies the condition (3) given by their Theorem 3.3, which might be the most difficult to check. The rest of the conditions for Theorem 2 are satisfied in general.

Let

g n ( β ) = n ( ‖ G n ( β ) − G ( β ) − G n ( β 0 ) ‖ ) 2 (28)

and for version S, using

u n ( β ) = ( ( S β 0 s ( s 1 , t 1 ) − S β 0 ( s 1 , t 1 ) ) − ( S β s ( s 1 , t 1 ) − S β ( s 1 , t 1 ) ) − ( S β 0 s ( s M , t M ) − S β 0 ( s M , t M ) ) − ( S β s ( s M , t M ) − S β ( s M , t M ) ) ) ′ (29)

Consequently, g n ( β ) can also be expressed as

g n ( β ) = n u ′ n ( β ) W 0 ^ u ′ n ( β ) .

Since the elements of n u ′ n ( β ) are bounded in probability, it is not difficult to see that the sequence { g n ( β ) } is bounded in probability and continuous in probability with g n ( β ) → p g n ( β ) as β → β ′ , using the assumption that same seed is used across different values of β and assuming that S β ( x , y ) is differentiable with respect to β and note that g n ( β 0 ) = 0 . Therefore, results given in section of Luong et al. [

C n = { β | | β − β 0 | ≤ δ n } in probability and hence has the property sup ‖ β − β 0 ‖ ≤ δ n g n ( β ) → p 0 as n → ∞ and β → β 0 .

We can see for version D as V = W 0 − 1 ,

n G n ( β 0 ) → p N ( 0 , W 0 − 1 ) . (30)

For version S, note that

( z n − z β s ) ′ = ( z n − z β 0 − ( z β S − z β 0 ) ) ′ = ( z n − z β 0 ) ′ − ( z β S − z β 0 ) ′ ,

we can see that

n G n ( β 0 ) → p N ( 0 , ( 1 + 1 τ ) W 0 − 1 ) (31)

as the simulated samples size is U = τ n and the simulated samples are independent of the original sample given by the data. Implicitly, we assume that the same seed is used across different values of β to obtain simulated samples.

Using results of Corollary 1, we have asymptotic normality for the MQD estimators for version D which is given by

n ( β ^ − β 0 ) → L N ( 0 , ( Γ ′ W 0 Γ ) − 1 ) , (32)

Γ is as given by expression (23) which can be estimated easily.

For version S, the SMQD estimators also follow an asymptotic normal distribution with

n ( β ^ S − β 0 ) → L N ( 0 , ( 1 + 1 τ ) ( Γ ′ W 0 Γ ) − 1 ) , (33)

an estimate of Γ can be obtained using the technique as given by Pakes and Pollard [

In this section, the quadratic distance Q n ( β ) will be used to construct goodness of fit test statistics for the simple hypothesis

H_{0}: data comes from a specified distribution with distribution F β 0 , β 0 is specified. The chi-square test statistics, their asymptotic distributions and their degree of freedoms r are given below with

n Q n ( β 0 ) → L χ 2 ( r = M ) for version D and (34)

n ( τ τ + 1 ) Q n ( β 0 ) → L χ 2 ( r = M ) for version S. (35)

The version S is of interest since it allows testing goodness of fit for continuous distributions without closed form bivariate survival functions as we only need to be able to simulate from these distributions. We shall justify the asymptotic chi-square distributions given by expression (34) and expression (35) below.

Note that

n Q n ( θ 0 ) = n G ′ n ( θ 0 ) W 0 ^ n G n ( θ 0 ) and for version D

n G n ( θ 0 ) → L N ( 0 , W 0 − 1 ) , W 0 − 1 = Ω 0 .

For version S,

n G n ( θ 0 ) → L N ( 0 , ( 1 + 1 τ ) W 0 − 1 ) .

We have the asymtoptic chi-square distributions as given, using standard results for distribution of quadratic forms and W 0 ^ → p W 0 , see Luong and Thompson [

The quadratic distances Q n ( β ) can also be used for construction of the test satistics for the composite hypothesis, Q n ( β ) is as defined by expression (18) for version D and as defined by expression (19) for version S. The null hypothesis can be stated as

H_{0}: data comes from a parametric model { F β } . The chi-square test statistics are given by

n Q n ( β ^ ) → L χ 2 ( r = M − m ) , (36)

for version D and for version S,

n ( τ τ + 1 ) Q n ( β ^ S ) → L χ 2 ( r = M − m ) (37)

where β ^ and β ^ S are the vector of MQD and SMQD estimators which minimize Q n ( β ) version D and version S respectively and assuming M > m . To justify these asymptotic chi-square dsitributions, note that we have for version D,

n Q n ( β ^ ) = n Q n a ( β ^ ) + o p ( 1 ) . It suffices to consider the asymptotic distribution of

n Q n a ( β ^ ) as we have the following equalities in distribution,

n Q n ( β ^ ) = d n Q n a ( β ^ ) = n ‖ L n ( β ^ ) ‖ 2 = n L ′ n ( β ^ ) W 0 ^ n L n ( β ) , L n ( β ) as given by

expression (22). Also, using expressions (24-26). n L n ( β ^ ) = d n G n ( β 0 ) + Γ n ( β ^ − β 0 ) which can be reexpressed as n L n ( β ^ ) = d n G n ( β 0 ) − Γ ( Γ ′ W 0 Γ ) − 1 Γ ′ W 0 n G n ( θ 0 ) using expressions (25-26).

or equivalently, n L n ( β ^ ) = d ( I − Γ ( Γ ′ W 0 Γ ) − 1 Γ ′ W 0 ) n G n ( θ 0 ) with

n G n ( θ 0 ) → L N ( 0 , W 0 − 1 ) .

We have n L n ( β ^ ) → L N ( 0 , Σ ) ,

Σ = ( I − Γ ( Γ ′ W 0 Γ ) − 1 Γ ′ W 0 ) W 0 − 1 ( I − W 0 Γ ( Γ ′ W 0 Γ ) − 1 Γ ′ ) and note that Σ W 0 = B

and the trace of the matrix B = I − Γ ( Γ ′ W 0 Γ ) − 1 Γ ′ W 0 is t r a c e ( B ) = M − m ; the rank of the matrix B is also equal to its trace. The argument used is very similar to the one used for the Pearson’s statistics, see Luong and Thompson [

Similarly, for version S, Q n ( β ^ S ) = d n Q n a ( β ^ S ) = n ‖ L n ( β ^ S ) ‖ 2 and n L n ( β ^ S ) = d ( I − Γ ( Γ ′ W 0 Γ ) − 1 Γ ′ W 0 ) n G n ( β 0 ) with n G n ( β 0 ) → L N ( 0 , ( 1 + 1 τ ) W 0 − 1 ) . This justifies the asymptotic chi-square distributions as given by expression (36) and expression (37).

In section (3.1) and section (3.2), we shall define a rule of selecting the points ( s l , t l ) , l = 1 , ⋯ , M if complete data are available. Equivalently, we would like to define the cells used to group the data and we shall see that random cells will be used as the points ( s l , t l ) , l = 1 , ⋯ , M constructed using quasi-Monte Carlo (QMC) numbers on the unit square multiplied by two chosen sample quantiles from the two marginal distributions will be used. For minimum chi-square methods it appears to be difficult to have a rule to choose cells to group the data, see discussions by Greenwood and Nikulin [

We shall define the pth sample quantile of a distribution as we shall need two sample quantiles from the marginal distributions together with QMC numbers to construct an approximation of an integral. Our quadratic distance based on selected points can be viewed as an approximation of a continuous version given by an integral.

From a bivariate distribution we have two marginal distributions F ( x ) and G ( y ) .The univariate sample pth quantile of the distribution F ( x ) assumed to be continuous is based the sample distribution function F n ( x ) = 1 n ∑ i = 1 n I [ x i ≤ x ] and it is defined to be α p ( n ) = inf { F n ( x ) ≥ p } and its model counterpart is given by α p = inf { F ( x ) ≥ p } . We also use the notation α p ( n ) = F n − 1 ( p ) and α p = F − 1 ( p ) . We define similarly the qth sample quantile for the distribution G ( y ) as β q ( n ) = G n − 1 ( q ) and its model counterpart β q = G − 1 ( q ) with 0 < p , q < 1 .

The sample quantile functions α p ( n ) or β q ( n ) can be viewed as statisticaf functionals of the form T ( H n ) with H n = F n or H n = G n . The influence function of T ( H n ) is a valuable tool to study the asymptotic properties of the statistical functional and will be introduced below. Let H be the true distribution and H n is the usual empirical distribution which estimates H; also let δ x be the degenerate distribution at x, i.e., δ x ( u ) = 1 if u ≥ x and δ x ( u ) = 0 , otherwise; the influence function of T viewed as a function of x, I C T , H ( x ) is defined as a functional directional derivative at H in the direction of ( δ x − H ) and by letting

H ε = H + ε ( δ x − H ) , i.e., I C T , H ( x ) = lim ε → 0 T ( H ε ) − T ( H ) ε = T ′ H ( δ x − H ) and

T ′ H is a linear functional.

Alternatively, it is easy to see that I C T , H ( x ) = ∂ H ε ∂ ε | ε = 0 and this gives a convenient way to compute the influence function. It can be shown that the influence function of the pth sample quantile T ( H n ) is given by

I C T , H ( x ) = p − 1 h ( H − 1 ( p ) ) , x < H − 1 ( p ) and I C T , H ( x ) = p h ( H − 1 ( p ) ) , x > H − 1 (p)

with h being the density function of the distribution H which is assumed to be absolutely continuous, see Huber [

Furthermore, as I C T , H ( x ) is based on a linear functional, the asymptotic variance of T ( H n ) is simply 1 n V ( I C T , H ( x ) ) with V ( . ) being the variance of the expression inside the bracket since in general we have E ( I C T , H ( x ) ) = 0 and the following representation when I C T , H ( x ) is bounded as a function of x

T ( H n ) = T ( H ) + T ′ F ( H n − H ) + o p (1n)

and

T ′ F ( H n − H ) = 1 n ∑ i = 1 n T ′ F ( δ x i − H ) ,

T ′ F ( δ x i − H ) = I C T , H ( x i ) , see Hogg et al. [

n ( T ( H n ) − T ( H ) ) → L N ( 0 , σ I C 2 ) , σ I C 2 = V ( I C T , H ( x ) ) .

The influence function representation of a functional which depends only on one function such as H n is the equivalent of a Taylor expansion of a univariate function and the influence function representation of a functional which depends on many functions is the equivalent of a Taylor expansion of a multivariate function with domain in an Euclidean space and having range being the real line. We will encounter an example of functionals which depend on three functions S n , F n , G n in section (3.2).

Subsequently, we shall introduce the Halton sequences with the bases b 1 = 2 and b 2 = 3 and the first M terms are denoted by ( u l , v l ) = ( φ b 1 ( l ) , φ b 2 ( l ) ) , l = 1 , 2 , ⋯ , M , we also use H M to denote set of points { ( u l , v l ) , l = 1 , 2 , ⋯ , M } . The sequence of points belong to the unit square ( 0 , 1 ) × ( 0 , 1 ) can be obtained as follows.

For b 1 = 2 , we divide the interval ( 0 , 1 ) into half ( b 1 = 2 ) then in fourth ( b 1 2 = 2 2 ) so forth so on to obtain the sequence 1 2 , 1 4 , 3 4 , ⋯ .

For b 1 = 3 , we divide the interval ( 0 , 1 ) into third ( b 2 = 3 ) then in ninth ( b 2 2 = 3 2 ) so forth so on to obtain the sequence 1 3 , 2 3 , 1 9 , ⋯ Now pairing them up we obtain the Halton sequence ( 1 2 , 1 3 ) , ( 1 4 , 2 3 ) , ( 3 4 , 1 9 ) , ⋯ Matlab and R

have functions to generate the sequences and see Glaserman [

Note that the Halton sequences are deterministic but if we are used to integration by simulation we might want to think the M terms represent a quasi random sample of size M from a bivariate uniform distribution which can be useful

for integrating a function of the form A = ∫ 0 1 ∫ 0 1 ψ ( x , y ) d x d y . Using the M terms of the Halton sequences it can be approximated as A ≈ 1 M ∑ l = 1 M ψ ( s l , t l ) which is similar to the sample mean from a random sample of size M.

From observations which are given by Z i = ( X i , Y i ) ′ , i = 1 , ⋯ , n iid with common bivariate distribution function K β ( x , y ) and survival function S β ( x , y ) . Let the two marginal distributions of K β ( x , y ) be denoted by F β ( x ) and G β ( y ) and also let F = F β 0 and G = G β 0 , K = K β 0 , S = S β 0 .

Define the bivariate empirical distribution function which is similar to the bivariate empirical survival function as

K n ( x , y ) = 1 n ∑ i = 1 n I [ x i ≤ x , y i ≤ y ] .

We might want to think that it admits a bivariate empirical density estimate k n ( x , y ) so that the following Cramer-Von Mises distances expressions are equivalent,

∫ 0 ∞ ∫ 0 ∞ ( S n ( x , y ) − S β ( x , y ) ) 2 d K n ( x , y )

is equivalent to

∫ 0 ∞ ∫ 0 ∞ ( S n ( x , y ) − S β ( x , y ) ) 2 k n ( x , y ) d x d y .

For univariate Cramér-Von Mises methods, see Luong and Blier-Wong [

In the next section we shall give details on how to form a type of quasi sample or artificial sample of size M from k n ( x , y ) using the Halton sequence of M terms and the pth-sample quantiles of the marginal distributions F and G which allow us to define the sequence ( s l , t l ) , l = 1 , ⋯ , M so that the above integrals can be approximated by the following finite sum of the type of an average of M terms,

1 M ∑ l = 1 M ( S n ( s l , t l ) − S β ( s l , t l ) ) 2 . (38)

We can see the expression (38) is an unweighted quadratic distance using the identity matrix I as weight matrix instead of W 0 ^ . The unweighted quadratic distance still produces consistent estimators but possibly less efficient estimators than estimators using the quadratic distance with W 0 ^ for large samples and for finite samples the estimators obtained using I might still have reasonable performances and yet being simple to obtained.

The set of points ( s l , t l ) , l = 1 , ⋯ , M is a set of points proposed to be used to form optimum quadratic distances in case that complete data is available. We shall see the set of points depend on two quantiles chosen from the two marginal distributions and they are random consequently, we might want to think that we end up working with random overlapping cells.

As for the minimum chi-square methods if random cells stabilize into fixed cells minimum chi-square methods in general have the same efficiency as based on stabilized fixed cells, see Pollard [

From the M terms of the Halton sequences, we have ( u l , v l ) , l = 1 , ⋯ , M .

Let η = 1 max ( u l , l = 1 , ⋯ , M ) and ϱ = 1 max ( v l , l = 1 , ⋯ , M ) , we can form the

artificial sample with elements given by ( s l , t l ) , l = 1 , ⋯ , M with s l = η u l F n − 1 ( p ) , t l = ϱ v l G n − 1 ( p ) with 0.90 ≤ p ≤ 0.99 . We can view ( s l , t l ) , l = 1 , ⋯ , M being a form of quasi random sample on the nonegative quadrant and these are the points proposed to be used in case of complete data is avalaible. In general, we might want to choose 20 ≤ M ≤ 30 if M 2 ≤ n and if n is small we try to ensure M ≤ n .Consequently as n → ∞ , M remains bounded.

Since F n − 1 ( p ) → p F − 1 ( p ) and G n − 1 ( p ) → p G − 1 ( p ) , ( s l , t l ) → p ( s l 0 , t l 0 ) with s l 0 = η u l F − 1 ( p ) and t l 0 = ϱ t l G − 1 ( p ) for l = 1 , ⋯ , M and the points ( s l 0 , t l 0 ) , l = 1 , ⋯ , M are non-random or fixed.

It turns out that quadratic distances for both versions constructed with the points ( s l , t l ) , l = 1 , ⋯ , M are asymptotic equivalent to quadratic distances using the points ( s l 0 , t l 0 ) , l = 1 , ⋯ , M so that asymptotic theory developed using the points ( s l , t l ) , l = 1 , ⋯ , M considered to be fixed continue to be valid; we shall show indeed this is the case. Similar conclusions have been established for the minimum chi-square methods with the use of random cells provide that these cells stabilize to fixed cells, see Theorem 2 given by Pollard [

Define { ( s , t ) } = { ( s l , t l ) , l = 1 , ⋯ , M } and similarly let

{ ( s 0 , t 0 ) } = { ( s l 0 , t l 0 ) , l = 1 , ⋯ , M } .

We work with the quadratic distance defined using { ( s , t ) } which leads to consider quadratic of the form ‖ G n ( β ) ‖ 2 as defined by expression (16) for version D and expression(18) for version S. Now to emphasize z n and z β which depend on { ( s , t ) } , we also use respectively the notations z n ( { ( s , t ) } ) and z β ( { ( s , t ) } ) and let z n 0 = z n ( { ( s 0 , t 0 ) } ) , z β 0 = z β ( { ( s 0 , t 0 ) } ) .

It suffices to verify that results of Theorem 1, Theorem 2 and its corollary in section (2) continue to hold.

Now observe that for both versions D and S we have

( z n − z β ) → p ( z β 0 0 − z β 0 ) (39)

and

( z n − z β S ) → p ( z β 0 0 − z β 0 ) (40)

so that G ( β ) remains the same for both versions since S β ( x , y ) is continuous with respect to ( x , y ) , S n ( x , y ) → p S β ( x , y ) and { ( s , t ) } → p { ( s 0 , t 0 ) } . Clearly, W 0 ^ ( { ( s , t ) } ) → p W 0 ( s 0 , t 0 ) . It remains to establish n ( z n − z β ) = n ( z n 0 − z β 0 ) + o p ( 1 ) and note that z β is random instead

of being fixed in section (2) as we are using an adaptive basis here. Using results on the influence functions representations for functionals as discussed, it suffices to show that the vector ( z n − z β ) has the same influence representation as the vector ( z n 0 − z β 0 ) to conclude that all the asymptotic results are valid even { ( s , t ) } are random.

We shall derive the influence functions for elements of the vector of functionals ( z n − z β ) and show that it is the same for the corresponding elements of the vector of functionals ( z n 0 − z β 0 ) .

Let δ x , y S ( u , v ) be the degenerate bivariate survival function at the point ( x , y ) , i.e., δ x , y S ( u , v ) = 1 if u < x and v < y and δ x , y S ( u , v ) = 0 ,otherwise.

Let the degenerate distribution function at be defined as δ x ( u ) = 1 if x ≤ u and δ x ( u ) = 0 if x > u . Similarly, let the degenerate distribution function at y be defined as δ y ( v ) = 1 if y ≤ v and δ y ( v ) = 0 if y > v . Now we can define the following contaminated bivariate survival and marginal distribution functions,

S ε ( u , v ) = S ( u , v ) + ε ( δ x , y S ( u , v ) − S ( u , v ) ) , 0 ≤ ε ≤ 1

which is a contaminated bivariate survival function and

F ε 1 ( u ) = F ( u ) + ε 1 ( δ x ( u ) − F ( u ) ) , 0 ≤ ε 1 ≤ 1.

Similarly,

G ε 2 ( v ) = G ( v ) + ε 2 ( δ y ( v ) − G ( v ) ) , 0 ≤ ε 2 ≤ 1.

Now we consider ( z j n − z j β ) the jth element of ( z n − z β 0 ) ,

( z j n − z j β 0 ) = S n ( s j ( F n ) , t j ( G n ) ) − S ( s j ( F n ) , t j ( G n ) ) = T j ( S n , F n , G n ) , j = 1 , ⋯ , M .

Clearly, T j ( S n , F n , G n ) depend on S n , F n , G n but we can use the influence function representation as given by Reid [

∂ T j ( S ε , F ε 1 , G ε 2 ) ∂ ε | ε = ε 1 = ε 2 = 0 = I [ x > s l 0 , y > t l 0 ] − S ( s l 0 , t l 0 ) which is bounded with respect to ( x , y ) , ∂ T j ( S ε , F ε 1 , G ε 2 ) ∂ ε 1 | ε = ε 1 = ε 2 = 0 and ∂ T j ( S ε , F ε 1 , G ε 2 ) ∂ ε 2 | ε = ε 1 = ε 2 = 0 .

It is interesting to note that ∂ T j ( S ε , F ε 1 , G ε 2 ) ∂ ε 1 | ε = ε 1 = ε 2 = 0 = 0 and ∂ T j ( S ε , F ε 1 , G ε 2 ) ∂ ε 2 | ε = ε 1 = ε 2 = 0 = 0 ; so they do not contribute to the influence function representation of T j ( S n , F n , G n ) .

If we consider the jth term of ( z n 0 − z β 0 0 ) given by the functional G j ( S n ) = S n ( s l 0 , t l 0 ) − S ( s l 0 , t l 0 ) which does not depend on F n and G n , we find that

∂ G j ( S ε ) ∂ ε | ε = ε 1 = ε 2 = 0 = ∂ T j ( S ε , F ε 1 , G ε 2 ) ∂ ε | ε = ε 1 = ε 2 = 0 .

Therefore, all the asymptotic results of section (2) remain valid and all these influence functions are bounded so that inference methods making use of these functionals are robust in general. Furthermore, we can consider the inference procedures based on quadratic distances as we have non-random points { ( s 0 , t 0 ) } and if needed they can be replaced by { ( s , t ) } without affecting the asymptotic results already established in section (2).

For iilustration, we consider a bivariate gamma model as discussed in section (1.2) introduced by Mathai and Moschopoulos [

The moments being considered are given by

E ( X ) = ( α 0 + α 1 ) β 1 , V ( X ) = ( α 0 + α 1 ) β 1 2 , E ( X − E ( X ) ) 3 = 2 ( α 0 + α 1 ) β 1 3 ,

they are the first three cumulants of the marginal distribution of X; the first three cumulants of the marginal distribution of Y are given by

E ( Y ) = ( α 0 + α 2 ) β 2 , V ( Y ) = ( α 0 + α 2 ) β 2 2 , E ( Y − E ( Y ) ) 3 = 2 ( α 0 + α 2 ) β 2 3 ,

and the covariance between X and Y, c o v ( X , Y ) = α 0 β 1 β 2 .

Let X ¯ , s X 2 , m 3 ( 1 ) be the first three sample cumulants of the marginal distribution of X and similarly let Y ¯ , s Y 2 , m 3 ( 2 ) be the first three sample cumulants of the marginal distribution of Y and finally let s X Y be the sample covariance between X and Y; the MM estimators for β as given by Mathai and Moschopoulos [

β 1 ˜ = m 3 ( 1 ) 2 s X 2 , β 2 ˜ = m 3 ( 2 ) 2 s Y 2 , α 0 ˜ = s X Y β 1 ˜ β 2 ˜

and

α 1 ˜ = s X 2 β 1 ˜ 2 − α 0 ˜ , α 2 ˜ = s Y 2 β 2 ˜ 2 − α 0 ˜ .

As the MM estimators depend on statistics of polynomials of degree 3, they will not be robust and they are very sensitive to outliers.

As the bivariate density function for the model is complicated, we consider version S for quadratic distance with M = 25 and we compare the performance of MM estimators versus SMQD estimators using the overall asymptotic relative criterion

A R E = M S E ( α 0 S ^ ) + M S E ( α 1 S ^ ) + M S E ( α 2 S ^ ) + M S E ( β 1 S ^ ) + M S E ( β 2 S ^ ) M S E ( α 0 ˜ ) + M S E ( α 1 ˜ ) + M S E ( α 2 ˜ ) + M S E ( β 1 ˜ ) + M S E ( β 2 ˜ ) .

The mean square errors (MSE) and ARE are estimated using simulated samples. For references on simulations, we find Ross [

M S E ( π ^ ) = E ( π ^ − π 0 ) 2 .

We fix M = 25, the two samples quantiles are 0.99 quantiles and we do not have problem to obtain the inverse matrix used as optimum matrix estimated from data. If we fix M = 30, occasionally the matrix Ω ^ 0 is nearly singular and the package R give us a warning sign. It takes around one to two minutes to complete one run on a laptop computer as we do not have large computers, so we fix N = 50 replications with each sample of size n = 500. We observe that in general, each individual ARE is smaller for SMQD estimators than for MM estimators and over all more efficient and robust than MM estimators confirming asymptotic theory; also note that MM estimators require that complete data is available. The unweighted simulated quadratic distance estimators using I perform almost as well as the estimators obtained using an optimum weight matrix. The range of parameters are fixed as follows, we let α 0 = 2 , α 1 = α 2 , β 1 = β 2 , α 1 = 2 , 3 , 4 , 6 , 8 , 9 , 10 and β 1 = 2 , 3 , 4 , 6 , 8 , 9 , 10 .

The chosen ranges are often encountered in actuarial science, we observe that in general when the samples are not contaminated the SMQD estimators are at least five times better than their counterparts, the results are summarized in the upper table of

For robustness study we contaminate the distributions used above with 5% of observations coming from a bivariate gamma with parameters outside the range being considered used to generate outliers. The bivariate gamma distribution used to generate outliers has parameters given by the vector β c = ( α 0 c = 2 , α 1 c = 2 , α 2 c = 2 , β 1 c = 50 , β 2 c = 50 ) ′ .We observe that SMQD estimators are at least 1000 times more efficient than MM estimators, we just display a row for illustrations of the order of efficiency gained by using SMQD methods and it is given at the bottom of

Minimum Quadratic distance methods offer a unified and numerical efficient

a⋱b | 2 | 3 | 4 | 6 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|

2 | 0.0372 | 0.0264 | 0.0446 | 0.0981 | 0.1595 | 0.1450 | 0.1807 |

3 | 0.0311 | 0.0039 | 0.0036 | 0.0028 | 0.0027 | 0.0288 | 0.0512 |

4 | 0.0448 | 0.0621 | 0.0376 | 0.0444 | 0.0790 | 0.0392 | 0.0565 |

6 | 0.0110 | 0.0528 | 0.0064 | 0.0031 | 0.0022 | 0.0158 | 0.0948 |

8 | 0.0063 | 0.0084 | 0.0046 | 0.0222 | 0.0039 | 0.0152 | 0.0949 |

9 | 0.0027 | 0.0039 | 0.0035 | 0.0072 | 0.0028 | 0.0058 | 0.0024 |

10 | 0.0055 | 0.0044 | 0.0649 | 0.0105 | 0.0195 | 0.0145 | 0.0180 |

A R E = M S E ( α 0 S ^ ) + M S E ( α 1 S ^ ) + M S E ( α 2 S ^ ) + M S E ( β 1 S ^ ) + M S E ( β 2 S ^ ) M S E ( α 0 ˜ ) + M S E ( α 1 ˜ ) + M S E ( α 2 ˜ ) + M S E ( β 1 ˜ ) + M S E ( β 2 ˜ )

Legend: Tabulated values are estimates of overall ARE (SMQD vs MM) based on simulated samples. Overall Asymptotic relative efficiencies comparisons between SMQD estimators and MM estimators using contaminated samples of size n = 500 with 5 contamination

α 0 = 2 , β 1 = β 2 = b , α 1 = α 2 = a

Legends: Tabulated values are estimates of overall ARE (SMQD vs MM) based on simulated contaminated samples. Samples are drawn from a contaminated model of the form pbivariate-Gamma ( α 0 = 2 , β 1 = β 2 = b , α 1 = α 2 = a ) + qbivariate-Gamma ( α 0 = 2 , β 1 = β 2 = 50 , α 1 = α 2 = 2 ) , p = 0.95, q = 0.05.

approach for estimation and model testing using grouped data and unify minimum chi-square methods with fixed or random cells based on the notion of projected score functions on finite bases and adaptive bases. The simulated version is especially suitable for handling bivariate distributions where numerical complications might arise. The rule on how to select points on the nonnegative quadrant for using minimum quadratic distance methods is also more clearly defined whereas it is not clear on how to form random cells for minimum chi-square methods. The methods appear to be relatively simple to implement and yet being more efficient and more robust than methods of moments in general especially when the model has more than three parameters.

The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.

Luong, A. (2018) Simulated Minimum Quadratic Distance Methods Using Grouped Data for Some Bivariate Continuous Models. Open Journal of Statistics, 8, 362-389. https://doi.org/10.4236/ojs.2018.82024