Pseudodistance Methods Using Simultaneously Sample Observations and Nearest Neighbour Distance Observations for Continuous Multivariate Models

Using the fact that a multivariate random sample of n observations also ge-nerates n nearest neighbour distance (NND) univariate observations and from these NND observations, a set of n auxiliary observations can be obtained and with these auxiliary observations when combined with the original multivariate observations of the random sample, a class of pseudodistance h D is allowed to be used and inference methods can be developed using this class of pseudodistances. The h D estimators obtained from this class can achieve high efficiencies and have robustness properties. Model testing also can be handled in a unified way by means of goodness-of-fit tests statistics derived from this class which have an asymptotic normal distribution. These properties make the developed inference methods relatively simple to implement and appear to be suitable for analyzing multivariate data which are often encountered in applications.


Introduction
For statistical inferences methods for continuous multivariate models, we often assume to have a random sample of size n of multivariate observations 1  For the parametric set-up ( ) { } ( ) f θ by means of various goodness-of-fit statistics. This leads to a composite null hypothesis and ideally we would like to use goodness-of-fit test statistics which follow a unique asymptotic distribution ∈ Ω θ , Ω assumed to be compact. The multidimensional model testing often poses difficulties as often goodness-of-fit test statistics used either have very complicated distributions such as the case of statistics which make use of multivariate empirical characteristic functions, see Csörgö [1] or for the classical chi-square tests where the asymptotic distributions for simple and composite hypotheses are simple but observations must be grouped into cells and there is some arbitrariness on choosing cells, see Moore [2], Klugman et al. [3] (pages 208-209) on extending chi-square tests for continuous multidimensional models. Goodness of fit test statistics using multivariate sample distribution function often has a very complicated null distribution, see Babu and Rao [4] and extensive simulations are needed to obtain the p values of the tests. For applications in various fields, it appears that there is a need for developing goodness of fit tests statistics which are relatively simple to implement with the property of the tests based on such statistics which are consistent.
Multivariate modelling is used in many fields which include actuarial sciences and finance. For financial applications, Moore [2] used the chi-square tests for testing whether the joint weekly returns of two assets follow a bivariate normal distribution but as mentioned earlier, for chi-square tests we need to partition the sample space into cells and the tests are not consistent despite the asymptotic null distributions of the statistics are simple.
In this paper, we shall introduce a class of pseudodistance ( ) , h D g f constructed based on a class of convex functions ( ) h x which measures the discrepancy between the two density functions g and f, see details in Section 2.3.
Goodness-of-fit test statistics for model testing based on h D will preserve the property of having a simple asymptotic null distribution comparable to chi-square tests but unlike chi-square tests, the tests based on h D are consistent for model testing.
It is also interesting to note that within this class h D , the statistic based on can also accommodate parameters being estimated using maximum likelihood (ML) method for composite hypothesis. On the estimation side, estimators based on h D will have the potential of having good efficiencies and robustness properties. Furthermore, estimation and model testing can be handled in a unified way.
The inference methods proposed extends previous methods for the univariate models to multivariate continuous models. This paper can be considered as a follow up of previous papers by Luong [5], Luong [6]. The neighbour distance (NND) notion is used in this paper to replace a similar notion of distance which , h D g f is a discrepancy measure between density g and density f.
α is a known constant with 0 1 α < < and in practice we choose α near 0, is used to construct the pseudo distance h D , estimation using this pseudo distance will give the maximum likelihood estimators. This h D as a pseudodistance is up to a few terms which does not depend on θ the Kullback-Leibler (KL) distance used to generate ML estimators. These few terms without involving θ do not affect estimation but they are very significant for construction of goodness-of-fit test statistics as goodness of fit test statistics constructed using h D will have an asymptotic normal distribution for model testing meanwhile goodness-of-fit test statistics using the KL pseudodistance do not have a simple asymptotic distribution especially for the composite null hypothesis case where parameters must be estimated using the ML estimators. We shall give more discussions in Section 2.2.
The paper is organized as follows. In Section 2, we introduce the auxiliary observations obtained from the NND observations. The class of pseudodistances h D is also introduced in this section. Asymptotic properties of estimators based on h D are considered in Section 3. Estimators obtained using are identical to ML estimators which are fully efficient. If other ( ) h x is used for h D , the corresponding estimators have the potential of good efficiencies and some robustness properties. These properties allow flexibility for balancing efficiency and robustness. In Section 4, goodness-of-fit statistics based on the class h D are shown to have an asymptotic normal distribution and in Section 5 an example is provided for illustration of the proposed techniques.

Nearest Neighbour Distance (NND) Observations
For each vector of observation , 1, , In the literature, these i r 's have been used to construct goodness of fit statistics, see Bickel and Breiman [8], Zhou and Jammalamadaka [9] but often these statistics for multivariate models do not have a simple asymptotic distribution which might create difficulties for applications. Now, we can define i y as given by proposition 2 by Ranneby et al. [7] , π is the usual constant pi used in formulas to find volume or area and ( ) . Γ is the commonly used gamma function.
Note that we have 1 , , n y y  which are n univariate auxiliary observations obtained from NND observations. Therefore, from the original observations of the sample 1 , , n x x  and using the n auxiliary observations, we can form the These n observations for n → ∞ are asymptotically independent and have a common density function given by the density of ( ) see the end of Section 2 given by Kuljus and Ranneby [10], (p1094). In fact, the situation is similar to the univariate case where spacings were used, see Luong [5] (pages 619-620).
Now we can consider the random criterion function for the class of function h defined by expressions (1) and (2), we shall see subsequently that inference methods based on the objective functions (4) are pseudodistance methods based on a class of pseudodistance ( ) , h D g f where g and f are density functions.
to construct h D then α should be set near 0 but within the range 0 1 α < < for robust estimation without relying on a, explicit multivariate density estimate which is needed for the minimum Hellinger method as proposed by Tamura and Boos [11]. Therefore, it appears that the class of pseudodistance methods being considered are very useful for applications and they are relatively simple to implement so that practitioners might want to use them for applied works.

Kullback-Leibler (KL) Pseudo-Distance
The negative of the log likelihood function can be expressed as  The KL pseudo-distance is defined as Howewer, for testing the validity of the model with the null composite hypothesis given by , ∈ Ω x θ θ and since ( ) g x appears in the LHS of expression (5), it must be estimated and replacing ( ) g x say by a multivariate density estimate ( ) g x will make the distribution of the LHS of expression (5)

The Class of Pseudo-Distances Dh
We shall focus on pseudo-distance methods based on ( ) , h D g f θ for parametric model with emphasis on continuous multivariate models but some of the previous univariate results which are scattered can also be unified by viewing them as pseudo distance methods.
In general for pseudodistances we require the following property: g and f are density functions. The property given by (6) For this sample, the observations are asymptotically independent using Propostion 3 by Ranneby et al. [7] (p413) and as n → ∞ , the distribution of ( ) tends to a common distribution, i.e., the common distribution is the distribution of the random vector ( ) ,Y X with joint density function given by Therefore, the results are very similar to the univariate case with the interpretation ( ) g x being a multivariate density here instead of a univariate density, the results given by Luong [5] (page 624) continue to hold and we also have: 2) Z and X are independent.
If we use Jensen's inequality it follows that

Consistency
It is not difficult to see that the h D estimators given by the vector θ which minimizes expression (4)

Asymptotic Normality
Using CLT and results given by Section 2 in Luong [5] (pages 626-631), we can conclude that θ θ is the commonly used information matrix with h D with the first and second derivatives of h denoted respectively by h′ and h′′ . The random variable Z follows a standard exponential distribution as given by expression 25 in Luong [5] (page 631) and from the standard exponential distribution, we also have

Goodness-of-Fit Tests Statistics Using h D 0
For model selection and model testing we are primary interested on testing the null composite hypothesis Howewer it might be easier to follow the procedure to construct test statistics by first consider the test based on 0 h D which is also implicitly based on h D for the simple hypothesis first where there is no unkown parameter.

Simple Null Hypothesis
For an α level test, we can reject Equivalently, we can reject Note that ( )   Note that the test is consistent as

,
we will reject 0 H with probability 1 should g f ≠ but this property is not shared by chi-square tests. Also, there is also the difficulty of arbitrariness of grouping observations into cells for chi-square tests, see Bickel and Breiman [8] for more discussions.
The corresponding test statistic given by expression (7) can be expressed ex- Now, we will proceed to establish the property by expression (10). Using the Mean Value Theorem and the following expansion around θ , we have , , 0, The use of the ML estimators ˆM L θ for chi-square distance type statistics often create complications when comes to derive the asymptotic distributions of these statistics, see Chernoff and Lehmann [13] (p580), Luong and Thompson [14] (p249-251).
For applications, it has been recognized that the maximum value attained by the log of the likelihood function can provide information on goodness-of-fit for the model being used, the test as given by expression (8)

Illustration
For illustration of the proposed methods, we use the multivariate normal model with d dimension; its density function is often parameterized using the mean µ and the covariance matrix Σ and it is given by Σ is the determinant of the matrix Σ , see Anderson [16] (page 20). There is redundancy when using elements of the matrix Σ as parameters as Σ being a covariance matrix; it is symmetric.
We can eliminate the redundancy by defining the vector of parameters as θ with Vech The Vech operator when applied to Σ extracts the lower triangular elements of Σ and stacks them in a vector. Equivalently, we can use the vector of parameters θ instead of µ and Σ and express the multivariate normal density as ( ) ; f x θ to avoid redundancy of the previous parameterization. We assume that we have a random sample of size n which allows us to obtain the auxiliary In this paper, we focus on presentations of methodologies of h D , leaving simulation studies for assessing power of the tests, the use of other distributions than the normal distribution for the null distribution of goodness-of-fit tests statistics and assessing efficiency when sample sizes are small or in finite samples for subsequent works. Practitioners might be encouraged to use these h D me-