_{1}

Using the fact that a multivariate random sample of n observations also generates n nearest neighbour distance (NND) univariate observations and from these NND observations, a set of n auxiliary observations can be obtained and with these auxiliary observations when combined with the original multivariate observations of the random sample
,
a class of pseudodistance D_{h} is allowed to be used and inference methods can be developed using this class of pseudodistances. The D_{h }estimators obtained from this class can achieve high efficiencies and have robustness properties. Model testing also can be handled in a unified way by means of goodness-of-fit tests statistics derived from this class which have an asymptotic normal distribution. These properties make the developed inference methods relatively simple to implement and appear to be suitable for analyzing multivariate data which are often encountered in applications.

For statistical inferences methods for continuous multivariate models, we often assume to have a random sample of size n of multivariate observations x 1 , ⋯ , x n which are independent and identically distributed as the d-dimensional vector of random variable x with a d-dimensional density function g ( x ) .

For the parametric set-up g ( x ) ∈ { f θ } , θ = ( θ 1 , ⋯ , θ m ) ′ and let the vector θ 0 denote the true vector of parameters, we would like to have statistical methods for estimating the vector θ 0 if the parametric model { f θ } can be assumed and inference methods to validate the assumption of the model { f θ } by means of various goodness-of-fit statistics. This leads to a composite null hypothesis and ideally we would like to use goodness-of-fit test statistics which follow a unique asymptotic distribution θ ∈ Ω , Ω assumed to be compact.

The multidimensional model testing often poses difficulties as often goodness-of-fit test statistics used either have very complicated distributions such as the case of statistics which make use of multivariate empirical characteristic functions, see Csörgö [

Multivariate modelling is used in many fields which include actuarial sciences and finance. For financial applications, Moore [

In this paper, we shall introduce a class of pseudodistance D h ( g , f ) constructed based on a class of convex functions h ( x ) which measures the discrepancy between the two density functions g and f, see details in Section 2.3. Goodness-of-fit test statistics for model testing based on D h will preserve the property of having a simple asymptotic null distribution comparable to chi-square tests but unlike chi-square tests, the tests based on D h are consistent for model testing.

It is also interesting to note that within this class D h , the statistic based on D h with h ( x ) = − log ( x ) can also accommodate parameters being estimated using maximum likelihood (ML) method for composite hypothesis. On the estimation side, estimators based on D h will have the potential of having good efficiencies and robustness properties. Furthermore, estimation and model testing can be handled in a unified way.

The inference methods proposed extends previous methods for the univariate models to multivariate continuous models. This paper can be considered as a follow up of previous papers by Luong [

The class of pseudo distances D h ( g , f ) is constructed using the following class of strictly convex functions h ( x ) with its second derivative h ″ ( x ) > 0 . For each chosen h ( x ) we then have a corresponding pseudodistance D h ( g , f ) and D h ( g , f ) is a discrepancy measure between density g and density f.

Explicitly, the function h ( x ) takes the form

h ( x ) = − x α , (1)

α is a known constant with 0 < α < 1 and in practice we choose α near 0, and h ( x ) can also have the form

h ( x ) = − log ( x ) . (2)

Note that − x α − 1 α → − log ( x ) as α decreases to 0 and h ( x ) only needs

to be defined up to an additive and a positive multiplicative constant and provided that these constants are known inference procedures based on D h with using α decreasing to 0 have efficiencies very close to inference procedures using D h with h ( x ) = − log ( x ) .

Furthermore, if h ( x ) = − log ( x ) is used to construct the pseudo distance D h , estimation using this pseudo distance will give the maximum likelihood estimators. This D h as a pseudodistance is up to a few terms which does not depend on θ the Kullback-Leibler (KL) distance used to generate ML estimators. These few terms without involving θ do not affect estimation but they are very significant for construction of goodness-of-fit test statistics as goodness of fit test statistics constructed using D h will have an asymptotic normal distribution for model testing meanwhile goodness-of-fit test statistics using the KL pseudodistance do not have a simple asymptotic distribution especially for the composite null hypothesis case where parameters must be estimated using the ML estimators. We shall give more discussions in Section 2.2.

The paper is organized as follows. In Section 2, we introduce the auxiliary observations obtained from the NND observations. The class of pseudodistances D h is also introduced in this section. Asymptotic properties of estimators based on D h are considered in Section 3. Estimators obtained using D h with h ( x ) = − log ( x ) are identical to ML estimators which are fully efficient. If other h ( x ) is used for D h , the corresponding estimators have the potential of good efficiencies and some robustness properties. These properties allow flexibility for balancing efficiency and robustness. In Section 4, goodness-of-fit statistics based on the class D h are shown to have an asymptotic normal distribution and in Section 5 an example is provided for illustration of the proposed techniques.

For each vector of observation x i , i = 1 , ⋯ , n in the random sample, we define r i the nearest neighbour distance (NND) to x i with

r i = min x j ≠ x i ‖ x j − x i ‖ ,

‖ ‖ is the commonly used Euclidean distance

and r i clearly can be obtained using the sample of d-variate observations.

In the literature, these r i ’s have been used to construct goodness of fit statistics, see Bickel and Breiman [

by proposition 2 by Ranneby et al. [

formulas to find volume or area and Γ ( . ) is the commonly used gamma function.

Note that we have y 1 , ⋯ , y n which are n univariate auxiliary observations obtained from NND observations. Therefore, from the original observations of the sample x 1 , ⋯ , x n and using the n auxiliary observations, we can form the following d + 1 multivariate observations

( x 1 , y 1 ) , ⋯ , ( x n , y n ) .

These n observations for n → ∞ are asymptotically independent and have a common density function given by the density of ( X , Y ) below,

p 0 ( x , y ) = g 2 ( x ) e − g ( x ) , (3)

see the end of Section 2 given by Kuljus and Ranneby [

Now we can consider the random criterion function

Q n h ( θ ) = 1 n ∑ i = 1 n h ( y i f θ ( x i ) ) (4)

for the class of function h defined by expressions (1) and (2), we shall see subsequently that inference methods based on the objective functions (4) are pseudodistance methods based on a class of pseudodistance D h ( g , f ) where g and f are density functions.

Minimizing Q n h ( θ ) with h ( x ) = − log ( x ) , we obtain the pseudodistance D h estimators which are identical to maximum likelihood (ML) estimators, ML estimators can be viewed as pseudodistance estimators based on the Kullback-Leibler (KL) distance but we shall see goodness-of-fit tests statistics are complicated with the use of the KL distance unlike the ones which are based on Q n h ( θ ) and consequently based on D h ( g , f θ ) . The KL pseudodistance used to derive ML estimators will be discussed in Section 2.2 and the class of pseudodistances will be introduced in Section 2.2.

Furthermore, if we use h ( x ) = − x α to construct D h then α should be set near 0 but within the range 0 < α < 1 for robust estimation without relying on a, explicit multivariate density estimate which is needed for the minimum Hellinger method as proposed by Tamura and Boos [

The negative of the log likelihood function can be expressed as

Q n M L ( θ ) = 1 n ∑ i = 1 n − log f θ (xi)

and ML estimators can be viewed as the values obtained by minimizing the observed version which can also be called sample version of the Kullback-Leibler (KL) pseudo-distance ( D K L ), i.e., D K L o ( g , f θ ) defined as

D K L o ( g , f θ ) = Q n M L ( θ ) + 1 n ∑ i = 1 n log ( g ( x i ) ) → p D K L ( g , f θ ) , (5)

→ p denotes convergence in probability,

and minimizing D K L o ( g , f θ ) is equivalent to maximize the log of the likelihood function.

The KL pseudo-distance is defined as D K L ( g , f θ ) = − E g ( log ( f θ g ) ) , is the KL pseudo-distance.

Howewer, for testing the validity of the model with the null composite hypothesis given by

H 0 : g ( x ) ∈ { f θ } , θ ∈ Ω

and since g ( x ) appears in the LHS of expression (5), it must be estimated and replacing g ( x ) say by a multivariate density estimate g ^ ( x ) will make the distribution of the LHS of expression (5) complicated despite that we can replace θ by θ ^ M L . This might explain the limited use of the KL pseudo-distance for construction of statistics for model testing with the use of θ ^ M L .

We shall focus on pseudo-distance methods based on D h ( g , f θ ) for parametric model with emphasis on continuous multivariate models but some of the previous univariate results which are scattered can also be unified by viewing them as pseudo distance methods.

In general for pseudodistances we require the following property:

D h ( g , f ) > 0 if g ≠ f , D h ( g , f ) = 0 if and only if g = f , (6)

g and f are density functions. The property given by (6) are needed for establishing consistency of estimation and for consitency goodness of fit tests, see Broniatowski et al. [

Since D h ( g , f ) in general is not observable if g is unknown, we shall see at the end of this section that can define an observed version D h 0 ( g , f ) with the property D h 0 ( g , f ) → p D h ( g , f ) . D h 0 ( g , f ) will satisfy the property given by expression (6) in probability which similar to D K L o ( g , f θ ) with the property D K L o ( g , f θ ) → p D K L ( g , f θ ) for the KL distance.

Now we work with following pair of observations to develop D h methods,

( X 1 , Y 1 ) ′ , ⋯ , ( X n , Y n ) ′ .

For this sample, the observations are asymptotically independent using Propostion 3 by Ranneby et al. [

p ( x , y ) = g 2 ( x ) e − y g ( x ) ,

see Kuljus and Ranneby [

Therefore, the results are very similar to the univariate case with the interpretation g ( x ) being a multivariate density here instead of a univariate density, the results given by Luong [

1) Z = Y g ( X ) follows a standard exponential distribution with density f ( z ) = e − z , z > 0 .

2) Z and X are independent.

If we use Jensen’s inequality it follows that

E ( h ( y f ( x ) ) ) = E Z ( E ( h ( z ) f ( x ) g ( x ) ) | Z ) > E Z ( h ( z ) E ( f ( x ) g ( x ) | Z ) ) = E z ( h (z))

for f ≠ g , since E ( f ( x ) g ( x ) | Z ) = E ( f ( x ) g ( x ) ) = 1

and E ( h ( y f ( x ) ) ) = E z ( h ( z ) ) for f = g .

Now, we can define

D h ( g , f ) = E ( h ( y f ( x ) ) ) − E z ( h (z))

and E z ( h ( z ) ) is a known constant which does not depend on the parameters given by the vector θ .

Under the parametric model, if we consider to minimize D h ( g , f θ ) but g is unknown, it leads to consider the observed objective function D h o ( g , f θ ) defined below which is based on D h ( g , f θ ) with

D h o ( g , f θ ) = 1 n ∑ i = 1 n ( h ( y i f θ ( x i ) ) ) − E z ( h ( z ) ) ,

note that we have

D h o ( g , f θ ) → p D h ( g , f θ ) .

The pseudodistance D h estimators given by the vector θ ^ based on D h ( g , f θ ) is obtained by minimizing D h o ( g , f θ ) . Equivalently, it is obtained by minimizing

Q n h ( θ ) = 1 n ∑ i = 1 n h ( y i f θ ( x i ) )

which is expression (4).

It is not difficult to see that the D h estimators given by the vector θ ^ which minimizes expression (4) is consistent by defining h ( z i ( θ , n ) ) = h ( y i f θ ( x i ) ) and by using assumptions and results of Section 3.1 as given by Luong [

The limit laws like uniform weak law of large numbers UWLLN and Central limit Theorem (CLT) are applicable by using the property of { z i ( θ , n ) } being a mixing sequence which is due to ( X i , Y i ) , i = 1 , ⋯ , n are asymptotically independent with a common distribution as n → ∞ . Therefore, θ ^ → p θ 0 .

Using CLT and results given by Section 2 in Luong [

n ( θ ^ − θ 0 ) → L N ( σ h 2 [ I ( θ 0 ) ] − 1 ) , → L denotes convergence in law, I ( θ 0 ) = − E θ 0 ( ∂ 2 ln f ( x ; θ 0 ) ∂ θ ′ ∂ θ ) is the commonly used information matrix with σ h 2 = 1 , if the function h ( x ) = − log ( x ) is used to define D h and D h 0 and σ h 2 = E Z [ ( h ′ ( Z ) ) 2 Z 2 ] [ E Z ( Z 2 h ″ ( Z ) ) ] 2 if the function h ( x ) = − x α , 0 < α < 1 is used to define D h and D h 0 with the first and second derivatives of h denoted respectively by h ′ and h ″ . The random variable Z follows a standard exponential distribution as given by expression 25 in Luong [

The D h estimators using h ( x ) = − x α , 0 < α < 1 , might have some robustness property using M-estimation theory, see Luong [

From the fact that the proposed D h methods are density based but without requiring an explicit density estimate to implement hence they appear to be simpler for practitioners and can be used as alternative to other robust methods such as the Hellinger methods as proposed by Tamura and Boos [

For model selection and model testing we are primary interested on testing the null composite hypothesis

H 0 : g ( x ) ∈ { f θ } , θ ∈ Ω .

Howewer it might be easier to follow the procedure to construct test statistics by first consider the test based on D h 0 which is also implicitly based on D h for the simple hypothesis first where there is no unkown parameter.

For simple H 0 : g = f ,

A natural statistics to use can be based on D h 0 ( g , f ) = 1 n ∑ i = 1 n ( h ( y i f ( x i ) ) − E z ( h ( z ) ) ) and since { h ( y i f ( x i ) ) , i = 1 , 2 , ⋯ } forms a mixing sequence, CLT can be applied with the distribution of each h ( y i f ( x i ) ) tends to a standard exponential random variable and Slutsky’s Theorem can also be used if needed. Therefore, the following test statistic n D h 0 ( g , f ) can be used and n D h 0 ( g , f ) → L N ( 0 , v Z h ) where v Z h is the variance of h ( Z ) where Z follows a standard exponential distribution.

For an α level test, we can reject H 0 if

n D h 0 ( g , f ) v Z h > z 1 − α

where z 1 − α is the ( 1 − α ) the percentile of the standard normal distribution.

Equivalently, we can reject H 0 if

− n D h 0 ( g , f ) v Z h < z α . (7)

Note that − D h 0 ( g , f ) = 1 n ∑ i = 1 n ( − h ( y i f ( x i ) ) − E z ( − h ( z ) ) ) and v Z h is also the variance of − h ( Z ) .

Now if we use D h 0 with h ( x ) = − log ( x ) , E z ( − h ( z ) ) = E z ( log ( Z ) ) and using the moment generating function of log Z which is given by M log Z ( t ) = E Z ( e t log z ) = Γ ( 1 + t ) with Γ ( . ) being the gamma function so that the cumulant generating function log M log Z ( t ) = log Γ ( 1 + t ) and by differentiating it, we can obtain the first two cumulants which are given by E z ( log ( Z ) ) = ψ ( 1 ) , v Z h = ψ ′ ( 1 ) , ψ ( . ) and ψ ′ ( . ) are respectively the digamma function and the trigamma function and they are available in most of the statistical packages.

The test statistic given by expression (7) can be expressed explicilty as

1 n ∑ i = 1 n y i + 1 n ∑ i = 1 n log f ( x i ) − n ψ ( 1 ) ψ ′ ( 1 ) (8)

and reject the simple H 0

if 1 n ∑ i = 1 n y i + 1 n ∑ i = 1 n log f ( x i ) − n ψ ( 1 ) ψ ′ ( 1 ) < z α for an α level test, 0 < α < 1 .

Note that the test is consistent as n D h 0 ( g , f ) → ∞ as n → ∞ if g ≠ f , so we will reject H 0 with probability 1 should g ≠ f but this property is not shared by chi-square tests. Also, there is also the difficulty of arbitrariness of grouping observations into cells for chi-square tests, see Bickel and Breiman [

Furthermore, if we use D h 0 with h ( x ) = − x α , 0 < α < 1 , E Z ( Z α ) = Γ ( 1 + α ) , E Z ( Z 2 α ) = Γ ( 1 + 2 α ) and v Z h = Γ ( 1 + 2 α ) − [ Γ ( 1 + α ) ] 2 .

The corresponding test statistic given by expression (7) can be expressed explicitly as

1 n ∑ i = 1 n ( y i f ( x i ) ) α − n Γ ( 1 + α ) Γ ( 1 + 2 α ) − [ Γ ( 1 + α ) ] 2 (9)

and reject the simple H 0 if

1 n ∑ i = 1 n ( y i f ( x i ) ) α − n Γ ( 1 + α ) Γ ( 1 + 2 α ) − [ Γ ( 1 + α ) ] 2 < z α .

For model testing, we consider the composite H 0 : g ( x ) ∈ { f θ } , θ ∈ Ω , since θ 0 is unknown, first we estimate θ 0 by θ ^ which minimizes D h 0 ( g , f θ ) , then we can form the following statistic, n D h 0 ( g , f θ ^ ) and we shall show that n D h 0 ( g , f θ ^ ) → L N ( 0 , v Z h ) which is similar to the statistic for the simple H 0 and unlike other statistics when parameters are estimated lead to complicated null distribution, we shall show that the statistics behave like the one used for simple H 0 in Section 4.1 and the equivalent rejection rules are similar to the ones given by expression (8) and expression (9) depending on the choice of h ( x ) used for D h 0 , D h .

In fact, these expressions remain valid for the composite H 0 provided that we replace f ( x i ) by f θ ^ ( x i ) , i = 1 , ⋯ , n when they appear in these expressions.

As we have seen by using a version of CLT for mixing sequences if needed, n D h 0 ( g , f θ 0 ) → L N ( 0 , v Z h ) , now if we can establish

n D h 0 ( g , f θ ^ ) = n D h 0 ( g , f θ 0 ) + o p ( 1 ) (10)

with o p ( 1 ) being a term which converges to in probability and by Slutzky theorem,

n D h 0 ( g , f θ ^ ) → L N ( 0 , v Z h ) .

Now, we will proceed to establish the property by expression (10). Using the Mean Value Theorem and the following expansion around θ ^ , we have

n D h 0 ( g , f θ 0 ) = n D h 0 ( g , f θ ^ ) + 1 2 n ( θ ^ − θ 0 ) ′ ∂ 2 D h 0 ( g , f θ ¯ ) ∂ θ ′ ∂ θ ( θ ^ − θ 0 )

with θ ¯ lies on the line segment joining θ ^ and θ 0 .

Since ∂ D h 0 ( g , f θ ^ ) ∂ θ = 0 and using ∂ 2 D h 0 ( g , f θ ¯ ) ∂ θ ′ ∂ θ → p E Z ( Z 2 h ″ ( Z ) ) I ( θ 0 ) ,

see Luong [

Furthermore, if we use h ( x ) = − log ( x ) for D h , θ ^ = θ ^ M L ,

n D h 0 ( g , f θ ^ M L ) = n D h 0 ( g , f θ ^ ) → L N ( 0 , v Z h ) .

The use of the ML estimators θ ^ M L for chi-square distance type statistics often create complications when comes to derive the asymptotic distributions of these statistics, see Chernoff and Lehmann [

For applications, it has been recognized that the maximum value attained by the log of the likelihood function can provide information on goodness-of-fit for the model being used, the test as given by expression (8) with f ( x i ) replaced by f θ ^ ( x i ) formalizes the informal procedures on the use of the maximum value of the log likelihood function for assessing goodness-of-fit of the model, see Klugman et al. [

For illustration of the proposed methods, we use the multivariate normal model with d dimension; its density function is often parameterized using the mean μ and the covariance matrix Σ and it is given by

f ( x ; μ , Σ ) = ( 2 π ) − 1 2 d | Σ | − 1 2 e − 1 2 ( x − μ ) ′ Σ − 1 ( x − μ ) , x ∈ R d . (11)

| Σ | is the determinant of the matrix Σ , see Anderson [

There is redundancy when using elements of the matrix Σ as parameters as Σ being a covariance matrix; it is symmetric.

We can eliminate the redundancy by defining the vector of parameters as θ with

θ = ( μ V e c h Σ ) .

The Vech operator when applied to Σ extracts the lower triangular elements of Σ and stacks them in a vector. Equivalently, we can use the vector of parameters θ instead of μ and Σ and express the multivariate normal density as f ( x ; θ ) to avoid redundancy of the previous parameterization. We assume that we have a random sample of size n which allows us to obtain the auxiliary univariate observations y 1 , ⋯ , y n from NND observations and there is no tied observation so that

y i > 0 , i = 1 , ⋯ , n

For illustration say we use D h 0 with h ( x ) = − log x , the vector of estimators in this case coincides with maximum likelihood (ML) estimators, i.e., θ ^ = θ ^ M L but for multivariate normal model, it is well known θ ^ M L can be obtained explicitly, see Anderson [

Explicitly,

θ ^ = θ ^ M L , θ ^ M L = ( x ¯ V e c h S ) , x ¯ = 1 n ∑ i = 1 n x i ,

x ¯ is the sample mean and S is the sample covariance matrix which can also be expressed as

S = 1 n ∑ i = 1 n ( x i − x ¯ ) ( x i − x ¯ ) ′ .

For model testing then we can use the test statistic

1 n ∑ i = 1 n y i + 1 n ∑ i = 1 n log f θ ^ M L ( x i ) − n ψ ( 1 ) ψ ′ (1)

and reject the model if the statistics gives a value smaller than z α for an α level test.

As Tamura and Boos [

In this paper, we focus on presentations of methodologies of D h , leaving simulation studies for assessing power of the tests, the use of other distributions than the normal distribution for the null distribution of goodness-of-fit tests statistics and assessing efficiency when sample sizes are small or in finite samples for subsequent works. Practitioners might be encouraged to use these D h methods.

The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.

The author declares no conflicts of interest regarding the publication of this paper.

Luong, A. (2019) Pseudodistance Methods Using Simultaneously Sample Observations and Nearest Neighbour Distance Observations for Continuous Multivariate Models. Open Journal of Statistics, 9, 445-457. https://doi.org/10.4236/ojs.2019.94030