Simulated Minimum Quadratic Distance Methods Using Grouped Data for Some Bivariate Continuous Models

Quadratic distance methods based on a special distance which make use of survival functions are developed for inferences for bivariate continuous models using selected points on the nonegative quadrant. A related version which can be viewed as a simulated version is also developed and appears to be suitable for bivariate distributions with no closed form expressions and numerically not tractable but it is easy to simulate from these distributions. The notion of an adaptive basis is introduced and the estimators can be viewed as quasilikelihood estimators using the projected score functions on an adaptive basis and they are closely related to minimum chi-square estimators with random cells which can also be viewed as quasilikeliood estimators using a projected score functions on a special adaptive basis but the elements of such a basis were linearly dependent. A rule for selecting points on the nonnegative quadrant which make use of quasi Monte Carlo (QMC) numbers and two sample quantiles of the two marginal distributions is proposed if complete data is available and like minimum chi-square methods; the quadratic distance methods also offer chi-square statistics which appear to be useful in practice for model testing.


Introduction
In actuarial science or biostatistics, we often encounter bivariate data which are already grouped into cells forming a contingency table and we would like to make How to cite this paper: Luong inferences for a continuous bivariate model used to model the complete data, see Partrat [1] (p. 225), Gibbons and Chakraborti [2] (pp. 511-512) for examples.
The bivariate distributions if they have closed form expressions then there is no difficulty in general to fit these distributions using maximum likelihood or minimum chi-square methods based on grouped data for examples but many useful distributions might only be computable numerically as they are expressible only using an integral representation and if the quadrature numerical methods often fail then it appears to be natural to develop simulated methods of inferences for these distributions. We would like to have methods which offer a unified approach to estimation and model testing as well beside they should be able to handle the situation where the lack of closed form expressions for the model survival distributions might create numerical difficulties. We shall see subsequently that new distributions created using the bivariate survival mixture operator (BSPM) introduced by Marshall and Olkin [3] (pp. 834-836) and by the trivariate reduction techniques often lead to distributions with no closed form expressions for survival functions but it is easy to draw samples from such distributions. Since we focus on nonnegative bivariate distributions in actuarial science, it is natural to use survival functions instead of just using distribution functions alone. The BSPM operator will be introduced and we shall see a few examples to illustrate the numerical difficulties we might encounter when fitting these distributions.

Bivariate Survival Power Mixture Operator
Marshall and Olkin [3] in their seminal paper have introduced the following op- Since there is an integral representation, the new survival function might still be computable numerically depending on the expressions for ( ) ( ) 1 2 , F x F y and ( ) G θ .An algorithm to simulate a sample from ( ) For most of the univariate distributions used in the paper, see Appendix A given by Klugman et al. [4] (pp. .

Some Examples of New Bivariate Distributions Created
Observe that if we specify a Gamma distribution for θ instead of a Lomax distribution as discussed earlier where the density function of θ is given by , , ln ; , H y F y y α α λ α λ λ = − = .
By using the usual conditioning argument, by conditioning on θ often we can obtain the first two moments of the vector ( ) , Z X Y ′ = and even higher positive integer moments can be obtained without having a closed form for ( ) , S x y β ; see the conditioning argument for the univariate case given by Klugman et al. [4] (pp. 62-65). If complete data are available then the parameters of some bivariate distribution created using the BSPM operator can be estimated using the methods of moment (MM).
In this paper we emphasize grouped data. In general, with grouped data MM estimators cannot be obtained and furthermore with four or five parameters in the bivariate model, high order moments must be used and as a result the MM esti-

( )
, S x y β can be evaluated numerically or not depends on the integrand and the numerical quadrature method used. In the same vein, we can mention the class of bivariate contingency tables studied by Mardia [7], Mardia [8], Plackett [9] as these distributions have numerical tractable bivariate distribution functions but no closed form expression for the bivariate distributions. In this paper, we emphasize statistical aspects and do not go into details in the question of dependence of the two components of the new bivariate survival function. Marshall and Olkin [3], Marshall and Olkin [10] have discussed some of the issues of dependence and infinitely divisibility for distributions created using mixture procedures. Data under the form of contingency tables are often encountered in actuarial science and biostatistics where bivariate observations are grouped into a two dimensions array or matrix and only the numbers of observations or proportions of the original sample which belong to the elements of such a matrix are recorded. The original data set is lost. Obviously, when the original complete data set is available, we can always group them into a contingency table but once grouped it is impossible to convert grouped data to complete data. We focus on the situations where the complete data are observations 1 , , n Z Z which are independent and identically distributed as which follows a bivariate absolutely continuous distribution with domain given by the nonnegative quadrant and subsequently they are grouped and we develop statistical inference techniques using grouped data.

New Bivariate Distributions Created by Trivariate Reductions
The bivariate density function ( )  Tables   Contingency table data can be viewed as a special form of two -dimensional grouped data. We will give some more details about this form of grouped data.Assume that we have a sample 511-512). We shall give a brief description below.

Contingency
Let the nonnegative axis X be partitioned into disjoints interval and the corresponding model probability is Note that and similarly, so they can be discarded without affecting the efficiency of inference methods and this is precisely the approach quadratic distance methods use by discarding redundant elements and create a basis with only linearly independent elements but the basis will span the same linear space. Consequently, we gain in numerical effciciency and at the same time retaining the same efficiency.

Efficient Modified Minimum Chi-Square Methods
Using the contingency table data, the modified minimum chi-square estimators which are as efficient as the likelihood estimators using the grouped data is obtained by minimizing the objective function given by Note that the quasi-score functions generated belong to the linear space spanned But since we have the property given by expressions (4)(5), the same linear space is spanned by Therefore, an equivalent method but possibly numerically more efficient, remember we need to evaluate numerically or by simulation is to minimize a quadratic form using the elements of the basis given by expression (7)  Luong [14] (pp. 463-468) for quadratic distance methods.
Also using expression (7) is equivalent to use overlapping cells of the form The objective function of the proposed quadratic form will be given below. It is a natural extension of the objective function used in the univariate case. Define a vector with components being elements of the basis so that we only need one subscript by collapsing the matrix given by expression (7) into a vector by putting the first row of the matrix as the first batch of elements of the vector and the second row being the second batch of elements so forth so on, i.e., let and its model counterpart is The number of components of n z is M with the assumption M m > . Efficient quadratic distance methods can be constructed using the inverse of the covariance matrix of n z as the weight matrix for the quadratic form but such an optimum matrix is only defined up to a constant, so the inverse of the matrix of the following vector introduce the notion of an adaptive basis used to achieve high efficiency and it will also unify quadratic distance and minimum chi-square methods. An adaptive basis is data dependent and therefore carry informations about the true vector of parameters 0 β despite that 0 β is unknown and consequently the projected scores functions on such a basis will lead to inference with better efficiency than without using the informations obtainable from data concerning 0 β , in general.
We shall discuss on how to construct such an adaptive basis when complete data is available in section (3) where we would like to give some light on the question on how to form a finite basis so that inference methods using the basis will have high efficiencies for a restricted parameter space which appear to be reasonable for the applications. It appears that such an adaptive basis which is formed from a com- An adaptive basis only has a finite number of elements so that numerically it is not so complicated to use such a basis to construct MQD or SMQD methods. The elements of the basis will be adapted to 0 β and efficiencies of the SMQD methods using such a basis come from the fact that the elements of the basis are chosen accordingly and adjustable depending on the value of the vector of true parame- In general, it is not obvious on how to construct an adaptive basis from a complete basis, for example it is a difficult task to construct an adaptive basis from the complete basis of polynomials so that the MQD methods using such a basis might have good efficiencies. Howewer, it is natural to construct an adaptive basis with only finite number of elements extracted from a complete basis as given by expression (11). This will be further developed in section (3) of the paper. The notion of an adaptive basis constructed from a complete basis used to project the score functions appears to be relatively new despite implicitly it has been used in the minimum chi-square methods without using explicitly this notion, see Moore and Spruill [17] and Pollard [18] (pp. 317-318).
In the literature, attentions seem to be given to complete bases. Adaptive bases will be further developed and discussed in section (3) [19], Duchesne et al. [20]; also see related results of gen- It might be worth to mention in practice without an infinite and complete basis and only using a finite basis which is a subset of an infinite basis high efficiency for the procedures can only be attained in some restricted parameter space in general unless the score functions belong to the span of the finite base. One viable strategy is to identify a restricted parameter space for the type of applications being considered; often it suffices to use a restricted parameter space then try to identify Clearly, these elements can be estimated empirically with the bivariate empirical survival function using grouped data provided by the contingency Therefore, we can define the matrix 0 Ω and its inverse is denoted by 0 W and similarly let 0 W be the inverse of 0 Ω . Clearly, 0 or equivalently as they will give asymptotic equivalent estimators and goodness-of-fit tests statistics; for finding estimators numerically, we need to minimize expression (16) and for asymptotic properties it might be slightly simpler to work with expression Note that the weight matrix 0 W is the same for both versions and the norm . is a weighted Euclidean norm which obeys the triangle inequality so that re- In Section 2, MQD and SMQD methods will be developed using predetermined grouped data. Asymptotic properties of the estimators are studied and asymptotic distribution for the model testing statistics are derived. The methods can be extended to the situation when comple data is available but will be grouped by defining a rule to choose points on the nonnegative quadrant to group the data. An artificial sample constructed using two sample quantiles and QMC numbers are proposed in Section 3 to select points and the methods developed with preselected points or cells in section (2) are shown to be still applicable, the methods can be seen as equivalent to minimum chi-square methods with random cells but with a rule to define these cells; using random cells is equivalent to using an adaptive

Estimation
Consistency for both versions of quadratic distance estimators using predetermined grouped data can be treated in a unified way using the following Theorem 1 which is essentially Theorem 3.1 of Pakes and Pollard [22] (p. 1038) and the proof has been given by the authors. In fact, their Theorems 3. This implies that there exist real numbers u and v with 0 u v Therefore, for both versions of ( ) n Q β whether deterministic or simulated, the minimum quadratic distance estimators MQD and SMQD estimators are consistent using Theorem 1, i.e., the vector of MQD estimators and the vector MSQD estimators converge in probability to β 0 . Theorem 3.1 of Pakes and Pollard [22] (pp. 1038-1039) is an elegant theorem, its proof is also concise using the norm concept of functional analysis and it allows many results to be unified. Now we turn our attention to the question of asymptotic normality for the quadratic distance estimators and it is possible to have unified approach using their Theorem 3.3, see Pakes and Pollard [22] (pp. 1040-1043) which we shall restate their Theorem as Theorem 2 and Corollary 1 given subsequently after the following discussion on the ideas behind their Theorem which allow to get asymptotic normality results for estimators obtained from extremum of a smooth or nonsmooth objective function.

S s t S s t S s t S s t
The points ( ) ( ) The matrix ( ) β Γ can be displayed explicitly as Note that ( ) a n Q β is differentiable for both versions. Since 3) which we shall rearrange the results of these two theorems before applying to version D and version S for MQD methods and verify that we can satisfy the regularity conditions of these two Theorems. We shall state Theorem 2 and Corollary 1 which are essentially their Theorem (3.3) and the proofs have been given by Pakes and Pollard [22] Note that the condition 4) is slightly more stringent but simpler than the condition iii) in their Theorem.
Also, for version S, the simulated samples are assumed to have size U n τ = and the same seed is used across different values of β to draw samples of size U. We make these assumptions and they are standard assumptions for simulated methods of inferences, see section 9.6 for method of simulated moments (MSM) given by Davidson and McKinnon [23] (pp. 383-394). For numerical optimization to find the minimum of the objective function, we rely on direct search simplex methods and the R package already has prewritten functions to implement direct search methods Theorem 2.
Let β be a vector of consistent estimators for 0 β ,the unique vector which Under the following conditions: 1) The parameter space Ω is compact, β is an interior point of Ω.

2)
or equivalently, using equality in distribution, The proofs of these results follows from the results used to prove Theorem 3.3 given by Pakes and Pollard [22] (pp. 1040-1043). For expression (13) or expression (14) to hold, in general only condition 5) of Theorem 2 is needed and there is no need to assume that ( ) 0 n G β has an asymptotic distribution. From the results of Theorem 2, it is easy to see that we can obtain the main result of the following Corollary 1 which gives the asymptotic covariance matrix for the quadratic distance estimators for both versions.
The matrices T and V depend on 0 β , we also adopt the notations We observe that condition 4) of Theorem 2 when applies to SMQD methods in

S s t S s t S s t S s t S s t S s t S s t S s t
Consequently, ( ) n g β can also be expressed as Since the elements of ( ) We can see for version D as For version S, note that we can see that ( ) as the simulated samples size is U n τ = and the simulated samples are independent of the original sample given by the data. Implicitly, we assume that the same seed is used across different values of β to obtain simulated samples.
Using results of Corollary 1, we have asymptotic normality for the MQD estimators for version D which is given by Γ is as given by expression (23) which can be estimated easily.
For version S, the SMQD estimators also follow an asymptotic normal distribution with ( ) ( ) an estimate of Γ can be obtained using the technique as given by Pakes and Pollard [22] (p. 1043).

Simple Hypothesis
In this section, the quadratic distance ( ) n Q β will be used to construct goodness of fit test statistics for the simple hypothesis The version S is of interest since it allows testing goodness of fit for continuous distributions without closed form bivariate survival functions as we only need to be able to simulate from these distributions. We shall justify the asymptotic chi-square distributions given by expression (34) and expression (35) below.

Note that
For version S, ( ) We have the asymtoptic chi-square distributions as given, using standard results for distribution of quadratic forms and 0 0 p  → W W , see Luong and Thompson [19] (p. 247) for example.

Composite Hypothesis
The quadratic distances We have and the trace of the matrix ( ) ; the rank of the matrix B is also equal to its trace. The argument used is very similar to the one used for the Pearson's statistics, see Luong and Thompson [19] (pp. 248-249).
Similarly, for version S, . This justifies the asymptotic chi-square distributions as given by expression (36) and expression (37). constructed using quasi-Monte Carlo (QMC) numbers on the unit square multiplied by two chosen sample quantiles from the two marginal distributions will be used. For minimum chi-square methods it appears to be difficult to have a rule to choose cells to group the data, see discussions by Greenwood and Nikulin [16] (pp. 194-208). We need a few tools to develop such a rule. We shall define sample quantiles then statistics can be viewed as functionals of the sample distribution; their influence functions are also needed and it allows us to find their asymptotic variance.

Preliminaries: Statistical Functional and Its Influence Function
We shall define the pth sample quantile of a distribution as we shall need two sample quantiles from the marginal distributions together with QMC numbers to construct an approximation of an integral. Our quadratic distance based on selected points can be viewed as an approximation of a continuous version given by an integral. . We also use the notation   The influence function representation of a functional which depends only on one function such as n H is the equivalent of a Taylor expansion of a univariate function and the influence function representation of a functional which depends on many functions is the equivalent of a Taylor expansion of a multivariate function with domain in an Euclidean space and having range being the real line. We will encounter an example of functionals which depend on three functions , ,   [26] (pp. 293-297) for the related pseudo codes; also see the seminal paper by Halton [27]. For the general principles of QMC methods, see Glasserman [26] (pp. 281-292). The Halton sequences together with two chosen sample quantiles from the two marginal distributions will allow us to choose points to match the bivariate empirical survival function with its model counterpart as we shall have an artificial sample with values on the nonnegative quadrant with the use of two empirical quantiles from the marginal distributions. These points can be viewed as sample points from an artificial sample and since they depend on sample quantiles which are robust statistics, the artificial sample can be viewed as free of outliers and the methods which make use of them will be robust.
Note that the Halton sequences are deterministic but if we are used to integration by simulation we might want to think the M terms represent a quasi random sample of size M from a bivariate uniform distribution which can be useful for integrating a function of the form ( )   (38) formances and yet being simple to obtained.
The set of points ( ) , , 1, , l l s t l M = is a set of points proposed to be used to form optimum quadratic distances in case that complete data is available. We shall see the set of points depend on two quantiles chosen from the two marginal distributions and they are random consequently, we might want to think that we end up working with random overlapping cells.
As for the minimum chi-square methods if random cells stabilize into fixed cells minimum chi-square methods in general have the same efficiency as based on stabilized fixed cells, see Pollard [18] (pp. 324-326) and Moore and Spruill [17] for the notion of random cells; quadratic distance methods will share the same properties that is to say the fact that the chosen points are random but it will be shown that they do stabilize and therefore these random points can be viewed as fixed points since they do not affect efficiencies of the estimators or asymptotic distributions of goodness-of-fit test statistics which make use of them. These properties will be discussed and studied in more details in the next section along with the introduction of an artificial sample of size M given by the points

An Illustration and a Simulation Study
For iilustration, we consider a bivariate gamma model as discussed in section (1.2) introduced by Mathai and Moschopoulos [11] (pp. 137-139) which is a model with 5 parameters given by the vector (  )   0  1  2  1  2 , , , , α α α β β ′ = β . Mathai and Moschopoulos [11] (pp. 138-140) also give the following model moments using the bivariate Laplace transform of the model and replacing them with the corresponding empirical moments and solving a system of equations give the method of moments (MM) estimators.
The moments being considered are given by As the MM estimators depend on statistics of polynomials of degree 3, they will not be robust and they are very sensitive to outliers.
As the bivariate density function for the model is complicated, we consider version S for quadratic distance with M = 25 and we compare the performance of MM estimators versus SMQD estimators using the overall asymptotic relative criterion The mean square errors (MSE) and ARE are estimated using simulated samples.
For references on simulations, we find Ross [30] and Johnson [31] useful. The mean square error of an estimator π for 0 π is defined as ( ) ( )

0M
SE E π π π = − . We fix M = 25, the two samples quantiles are 0.99 quantiles and we do not have problem to obtain the inverse matrix used as optimum matrix estimated from data.
If we fix M = 30, occasionally the matrix 0 Ω is nearly singular and the package R give us a warning sign. It takes around one to two minutes to complete one run on a laptop computer as we do not have large computers, so we fix N = 50 replications with each sample of size n = 500. We observe that in general, each individual ARE is smaller for SMQD estimators than for MM estimators and over all more efficient and robust than MM estimators confirming asymptotic theory; also note that MM estimators require that complete data is available. The unweighted simulated quadratic distance estimators using I perform almost as well as the estimators  table of Table 1.
For robustness study we contaminate the distributions used above with 5% of observations coming from a bivariate gamma with parameters outside the range being considered used to generate outliers. The bivariate gamma distribution used to generate outliers has parameters given by the vector ( ) .We observe that SMQD estimators are at least 1000 times more efficient than MM estimators, we just display a row for illustrations of the order of efficiency gained by using SMQD methods and it is given at the bottom of Table 1. The limited study seems to point to better efficiencies and robustness for SMQD estimators than MM estimators. MM estimators are very vulnerable to outliers as expected. Despite the study is limited but it tends to confirm theoretical asymptotic results; more numerical and simulation works need to be done to further study the performances of the estimators proposed especially the performances in finite samples.

Conclusion
Minimum Quadratic distance methods offer a unified and numerical efficient  approach for estimation and model testing using grouped data and unify minimum chi-square methods with fixed or random cells based on the notion of projected score functions on finite bases and adaptive bases. The simulated version is especially suitable for handling bivariate distributions where numerical complications might arise. The rule on how to select points on the nonnegative quadrant for using minimum quadratic distance methods is also more clearly defined whereas it is not clear on how to form random cells for minimum chi-square methods. The methods appear to be relatively simple to implement and yet being more efficient and more robust than methods of moments in general especially when the model has more than three parameters.