_{1}

^{*}

Maximum likelihood (ML) estimation for the generalized asymmetric Laplace (GAL) distribution also known as Variance gamma using simplex direct search algorithms is investigated. In this paper, we use numerical direct search techniques for maximizing the log-likelihood to obtain ML estimators instead of using the traditional EM algorithm. The density function of the GAL is only continuous but not differentiable with respect to the parameters and the appearance of the Bessel function in the density make it difficult to obtain the asymptotic covariance matrix for the entire GAL family. Using M-estimation theory, the properties of the ML estimators are investigated in this paper. The ML estimators are shown to be consistent for the GAL family and their asymptotic normality can only be guaranteed for the asymmetric Laplace (AL) family. The asymptotic covariance matrix is obtained for the AL family and it completes the results obtained previously in the literature. For the general GAL model, alternative methods of inferences based on quadratic distances (QD) are proposed. The QD methods appear to be overall more efficient than likelihood methods infinite samples using sample sizes
*n*
≤5000 and the range of parameters often encountered for financial data. The proposed methods only require that the moment generating function of the parametric model exists and has a closed form expression and can be used for other models.

The generalized asymmetric Laplace distribution (GAL) is a four parameters infinitely divisible continuous distribution with four parameters given by

β = ( θ , μ , σ , τ ) ′ . (1)

The parameter θ is a location parameter and σ is a scale parameter. The parameter μ can be viewed as the asymmetry parameter of the distribution and τ is the shape parameter which controls the thickness of the tail of the distribution. If μ = 0 , the distribution is symmetric around θ , see Kotz et al. ( [

M ( s ) = e θ s ( 1 − μ s − 1 2 σ 2 s 2 ) τ , σ , τ ≥ 0 , β = ( θ , μ , σ , τ ) ′ , (2)

s must satisfy the inequality

1 − 1 2 σ 2 s 2 − μ s > 0. (3)

The GAL distribution is also known as variance gamma (VG) distribution. It was introduced by Madan and Senata [

From the moment generating function, it is easy to see that the first four cumulants of the GAL distribution are given by

c 1 = θ + τ μ , c 2 = τ σ 2 + τ μ 2 , (4)

c 3 = 3 τ σ 2 μ + 2 τ μ 3 , c 4 = 6 τ μ 4 + 12 τ μ 2 σ 2 + 3 τ σ 4 . (5)

Note that c 3 = 0 if μ = 0 and c 3 can be positive or negative depending on values of the parameters .Therefore, the GAL distribution can be symmetric or asymmetric. Furthermore, with c 4 > 0 , the tail of the GAL distribution is thicker than the normal distribution. These characteristics make the GAL distribution useful for modelling asset returns, see Senata [

The moments can be obtained based on cumulants and they are given below,

E ( X ) = c 1 , E ( X − E ( X ) ) 2 = c 2 , E ( X − E ( X ) ) 3 = c 3 , E ( X − E ( X ) ) 4 = 6 τ μ 4 + 12 τ μ 2 σ 2 + 3 τ σ 4 + 3 τ 2 σ 4 + 6 σ 2 τ 2 μ 2 + 3 μ 4 τ 2 .

The GAL distribution belongs to the class of normal mean-variance mixture distributions where the mixture variable follows agamma distribution with shape parameter τ and scale parameter equal to 1, i.e., with density function

f w ( w ) = 1 Γ ( τ ) w τ − 1 e − w , w , τ > 0 , Γ ( . ) is the commonly used gamma function.

This leads to the following representation in distribution using Expression (4.1.10) in Kotz et al. ( [

X = θ d + μ Y + σ Y Z where (6)

1) Z ~ N ( 0 , 1 ) ,

2) Y ~ G ( τ , 1 ) as given by expression (8) and independent of Z

3) θ , μ , σ , τ are parameters with σ , τ > 0 。

The representation given by expression (6) is useful for simulating samples from a GAL distribution. Note that despite the simple closed form expression for the moment generating function, the density function is rather complicated as it depends on the modified Bessel function of the third kind with real index λ , i.e., K λ ( u ) see Kotz et al. ( [

The GAL parametric family can be introduced as a limit case of the generalized hyperbolic (GH) family where the mixing random variable belongs to the generalized inverse Gaussian family, see Mc Neil et al. [

X = θ d + σ 2 ( 1 κ G 1 − κ G 2 ) , (7)

G 1 and G 2 are independent random variables with common gamma distribution. The common mgf of the gamma distribution is given by

M G ( s ) = 1 ( 1 − s ) α , see expression (4.1.1) given by Kotz et al. ( [

If we introduce κ using μ = σ 2 ( 1 κ − κ ) , the GAL distribution can also be

parameterised using the four equivalent parameters, i.e., with θ , σ , κ , τ .

Moment estimation for the GAL family has been given by Podgorski and Wegener [

variable Y ~ Gamma ( τ , ϕ 2 ) which implies the following form of the moment

generating function for X ,

M ( s ) = e θ s ( 1 − ϕ 2 ( μ s + 1 2 σ 2 s 2 ) ) τ , ϕ > 0.

From the above expression, it is easy to see that the parameter ϕ > 0 is redundant and the parameterisation using five parameters will introduce instability in the estimation process. It appears to be simpler to use the parameterisation given by Kotz et al. [

Hu [

In this subsection, we first review a few parameterisations which are commonly used for the GAL distribution.

Definition 1 (GAL density)

From the GH density, the density function for the GAL distribution can be obtained and it can be expressed as

f ( x ; θ , σ , μ , τ ) = 2 e μ σ ( X − θ σ ) σ π Γ ( τ ) ( 1 2 + ( μ σ ) 2 ) τ − 0.5 ( | X − θ | σ ) τ − 0.5 ⋅ K τ − 0.5 ( 2 + ( μ σ ) 2 | X − θ | σ ) . (8)

The vector of parameters is β = ( θ , μ , σ , τ ) ′ and we shall call this parametrisation parameterisation1.The density can be derived using thenormal mean variance mixture representation given by expression (6). See expression (3.30) given by Mc Neil et al. ( [

Alternatively, by letting μ = σ 2 ( 1 κ − κ ) and keeping other parameters as in

parametrisation 1, we obtain the following expression for the density of a GAL distribution

f ( x ; θ , σ , κ , τ ) = 2 e 2 2 ( 1 κ − κ ) ( X − θ σ ) σ π Γ ( τ ) ( 2 1 κ + κ ) τ − 0.5 ( | X − θ | σ ) τ − 0.5 ⋅ K τ − 0.5 ( 1 2 ( 1 κ + κ ) | X − θ | σ ) . (9)

with the vector of parameters given by β = ( θ , κ , σ , τ ) ′ . We shall call this parameterisation, parameterisation 2 which is used by Kotz et al. ( [

Note that θ , σ are respectively the location and scale parameter with either parameterisation 1 or 2. Setting θ = 0 , σ = 1 , the standardized GAL density with parameterisation 2 will have only two parameters and it is given by

f ε ( x ; κ , τ ) = 2 e 2 2 ( 1 κ − κ ) x π Γ ( τ ) ( 2 1 κ + κ ) τ − 0.5 ( | x | ) τ − 0.5 K τ − 0.5 ( 1 2 ( 1 κ + κ ) | x | )

or equivalently by using parametrisation1,

f ε ( x ; μ , τ ) = 2 e μ x π Γ ( τ ) ( 1 2 + ( μ ) 2 ) τ − 0.5 ( | x | ) τ − 0.5 K τ − 0.5 ( 2 + ( μ ) 2 | x | ) .

Following Kotz et al. [

M ( s ) = e c s ( 1 − θ ′ ν s − 1 2 ν σ ' 2 s 2 ) 1 / ν , (10)

the parameters are θ ′ , σ ′ , ν , c with

θ ′ = μ ν , σ ′ 2 = σ 2 ν , ν = 1 / τ , c = θ .

The first four moments using parameterisation 3 as given by Senata ( [

E ( X ) = c + θ ′ , V ( X ) = σ ′ 2 + θ ′ 2 ν , E ( X − E ( X ) ) 3 = 2 θ ′ 3 ν 2 + 3 σ ′ 2 θ ′ ν , E ( X − E ( X ) ) 4 = 3 σ ′ 4 ν + 12 σ ′ 2 θ ′ 2 ν 2 + 6 θ ′ 4 ν 3 + 3 σ ′ 4 + 6 σ ′ 2 θ ′ 2 ν + 3 θ ′ 4 ν 2 . ,

The GAL random variable can also be expressed as the difference of two independent gamma random variables, the GAL random variable is nested inside the class of bilateral gamma random variable Y which can be represented as

Y = θ d + G 1 − G 2 (11)

with G 1 , G 2 are independent gamma random variables with the mgf’s given

respectively by M G 1 ( s ) = 1 ( 1 − β 1 s ) α and M G 2 ( s ) = 1 ( 1 − β 2 s ) α . We obtain the GAL random variable by letting β 1 = σ κ 2 and β 2 = κ σ 2 .

The class of bilateral gamma distribution was introduced by Küchler and Tappe [

be the random variable with mgf given by M Y E ( s ) = M Y ( s + h ) M Y ( s ) . I is easy to see

that Y E = θ d + G ¯ 1 − G ¯ 2 , G ¯ 1 and G ¯ 2 are independent gamma random variables with common shape parameter α ( α = 1 ) and scale parameters given

respectively by β ¯ 1 = β 1 1 − β 1 h and β ¯ 2 = β 2 1 + β 2 h .

For option pricing with the risk neutral approach, this property is useful as it is easy to simulate samples from a bilateral gamma distribution. The use of Esscher transform to find risk neutral parameters for option pricing in financeisdue to the seminal works of Gerber and Shiu [

For numerical methods to find estimators, Nelder-Mead simplex method and related derivative free simplex methods are recommended. Derivative free simplex direct search methods are well described in chapter 16 of the book by Bierlaire [

The paper is organized as follows. In Section 2, some submodels of the GAL family are introduced to highlight the difficulty on obtaining the asymptotic covariance matrix using classical likelihood theory. Asymptotic properties of the ML estimators are investigated in section (3). The ML estimators for the GAL

family are shown to be consistent for τ > 1 2 . For the special case with τ = 1 ,

this corresponds to the asymmetric Laplace (AL) model, we obtain the asymptotic covariance matrix in closed form using the approach based on M-estima- tion theory as given by Huber [

We shall consider first a few submodels of the GAL model to show the difficulties encountered when likelihood theory is used to obtain the asymptotic covariance matrix for ML estimators.

The difficulties are mainly due to the score functions when viewed as functions of the parameters have a discontinuity point and fail to be differentiable. If the asymptotic covariance matrix for the ML estimators is derived based on likelihood theory, it will have missing components. This is the problem of expression (2) given by Kotz et al. ( [

Example 1

Let μ = 0 , τ = 1 , σ = 1 , τ = 1 and the only parameter is the location parameter

and the family is symmetric around θ . Using the result K 1 2 ( u ) = π 2 u e − u , the

density function is reduced to

f ( x , θ ) = 1 2 s 0 e − | x − θ | s 0 , s 0 = 1 2 .

Equivalently,

f ( x , θ ) = f 0 ( x − θ ) , f 0 ( x ) = 1 2 e − 2 | x | , − ∞ < x < ∞ .

This is the well known double exponential distribution, the maximum likelihood estimator for θ is the sample median. There is no Fisher information matrix available as the score function is discontinuous with respect to the parameter θ . The asymptotic variance of the sample median can be found by using M-estimation theory, see Huber [

Example 2

Using the density of the GAL distribution and setting τ = 1 , we obtain the AL distribution with only 3 parameters. The location and scale parameters are given respectively by θ , σ and the asymmetry parameter μ . If parameterisation 2 is used, the density function g ( x ; θ , σ , κ ) of the AL distribution is based on the standardized AL density as given by expression (4.1.31) in Kotz et al. ( [

g ( x ; θ , σ , κ ) = 2 σ γ exp ( 2 2 δ ( X − θ σ ) ) exp ( − 2 2 γ ( | X − θ | σ ) ) , γ = κ + 1 k , δ = κ − 1 k .

The AL family can be considered as a subfamily of the GAL family and the score functions for this model are again discontinuous. We shall derive the asymptotic covariance matrix using M-estimation theory in Section (3.2) and complete the expression (2) of Kotz et al. ( [

For consistency of the MLE, the following Theorem which is Theorem 2.5 given by Newey and McFadden ( [

Theorem (Consistency)

Assume that:

1) If β 1 ≠ β 2 then f ( x ; β 1 ) ≠ f ( x ; β 2 ) .

2) The parameter space Θ is compact, β 0 ∈ Θ .

3) f ( x ; β ) is a continuous with respect to β .

4) E ( sup β ∈ Θ | ln f ( x ; β ) ) < ∞ .

Under the conditons stated, the ML estimators (MLE) given by the vector β ^ is obtained by maximizing the log of the likelihood function

ln L ( β ) = ∑ i = 1 n ln f ( x ; β ) is consstent, β ^ → p β 0 .

One can see that the conditions for consistency are mild, the condition d) will

be satisfied for the GAL family if τ > 1 2 as the density function remains bounded. For τ ≤ 1 2 , the density functions with θ = 0 , μ = 0 tend to infinity as

x → 0 + , see Theorem 4.1.2 given by Kotz et al. ( [

It might be possible to prove consistency using the approach to obtain results of Theorem 4 by Broniatowski et al. ( [

For asymptotic normality, it is more complicated as standard theory often requires that the function ln L ( β ) being twice differentiable with respect to β . The appearance of the Bessel function creates further complications. It makes it very difficult to establish asymptotic properties even with the use of M estimation theory.

For the special case with τ = 1 which corresponds to the AL distribution, the density function can be expressed without the use of the Bessel function and M- estimation theory can be used to find the asymptotic covariance matrix for the ML estimators. Asymptotic normality has been shown by Kotz et al. ( [

The formula (2.2) given by Kotz et al. ( [

M-estimation theory allows the score functions when viewed as functions of the parameters to have a few points of discontinuities and full differentiability with respect to β can be replaced by one side differentiability accordingly. Amemiya ( [

1 n ∑ i = 1 n ψ ( x i , θ ) = 0 ,

using the indicator function I [ . ] ,

ψ ( x , θ ) = − I [ θ < x ] and ψ ( x , θ ) = I [ θ > x ] , ψ ( x , θ ) = 0 if x = θ .

The function ψ ( x , θ ) is simply the one side derivative and we adopt the notation

ψ ( x , θ ) = ∂ | x − θ | ∂ θ with the meaning of one side derivative, also see Hogg et

al. ( [

Another M estimator for the location parameter θ has been proposed by Huber ( [

1 n ∑ i = 1 n ψ ( x i , θ ) = 0 with

ψ ( x , θ ) = x − θ if | x − θ | ≤ k , k is chosen.

ψ ( x , θ ) = k , if | x − θ | > k .

For M-estimators based on ψ ( x , β ) , where β is a vector of parameters, Huber [

1 n ∑ i = 1 n ψ ( x , β ) = 0 . (12)

Under the following main conditions:

a) 1 n ∑ i = 1 n ψ ( x i , β ^ ) → p 0 , assuming β ^ → p β 0 has been shown,

b) λ ( β 0 ) = E β 0 ( β 0 ) = 0 , λ ( β ) = E β 0 ( ψ ( x , β ) ) ,

with assumption N-3 given by Huber ( [

1 n ∑ i = 1 n ψ ( x , β 0 ) = − Λ ( β 0 ) n ( β ^ − β 0 ) + o p ( 1 ) ,

Λ ( β 0 ) = ∂ λ ( β ) ∂ β ′ | β = β 0 and o p ( 1 ) is a term converging to 0 in probability.

When we compare with the usual Taylor expansion, we only require λ ( β ) = E β 0 ( ψ ( x , β ) ) to be differentiable with respect to β . This differentiability condition is satisfied for the AL family. Note that if indeed the score functions are differentiable then − Λ ( β 0 ) is the Fisher information matrix.

For the technical details on how to verify the conditions N-3, see Hinkley and Revankar ( [

∫ − ∞ ∞ ψ ( x , β ^ ) d F n ( x ) → p ∫ − ∞ ∞ ψ ( x , β 0 ) d F β 0 ( x ) = E β 0 ( β 0 ) = 0 , F n ( x ) is the

sample distribution function, the score functions are given by expressions (14)-(16).

From the above representation, we then have

n ( β ^ − β 0 ) → L N ( 0 , ( [ Λ ( β 0 ) ] − 1 ) ) V β 0 ( ψ ( x , β 0 ) ) ( [ Λ ( β 0 ) ] − 1 ) ′ .

The asymptotic covariance matrix of β ^ is given by

V ( β ^ ) = 1 n ( [ Λ ( β 0 ) ] − 1 ) V β 0 ( ψ ( x , β 0 ) ) ( [ Λ ( β 0 ) ] − 1 ) ′ , (13)

V β 0 ( ψ ( x , β 0 ) ) is the covariance matrix of the vector ψ ( x , β 0 ) , ψ ( x , β 0 ) is the vector of the true score functions or quasi score functions if a proxy density function is used to replace the true density function.

Now based on M-estimation theory, we proceed to find Λ ( β 0 ) and V β 0 ( ψ ( x , β 0 ) ) for the AL distribution to obtain the asymptotic covariance matrix of the ML estimators in the following section.

Kotz et al. [

Since

ln g ( x ; θ , σ , κ ) = ln 2 − ln σ − ln γ + δ 2 2 ( x − θ ) σ − γ 2 2 | x − θ | σ ,

the following derivatives are the score functions of the AL distribution,

ψ 1 ( x ; θ , σ , κ ) = ∂ ln g ( x ; θ , σ , κ ) ∂ θ = − 2 δ 2 σ − 2 γ 2 σ v ( x ; θ ) with v ( x ; θ ) = − I [ x > θ ] + I [ x < θ ] , v ( x ; θ ) = 0 if x = θ . (14)

ψ 2 ( x ; θ , σ , κ ) = ∂ ln g ( x ; θ , σ , κ ) ∂ σ = − 1 σ − 2 δ 2 ( x − θ ) σ 2 + 2 γ 2 | x − θ | σ 2 , (15)

ψ 3 ( x ; θ , σ , κ ) = ∂ ln g ( x ; θ , σ , κ ) ∂ κ = − ∂ γ ∂ κ γ + ∂ δ ∂ κ 2 ( x − θ ) σ − 2 2 ∂ γ ∂ κ | x − θ | σ . (16)

Let β = ( θ , σ , κ ) ′ and β 0 the vector of the true parameters we need to find first the vector

λ ( β ) = ( λ 1 ( β ) , λ 2 ( β ) , λ 3 ( β ) ) ′ , λ i ( β ) = E β 0 ( ψ i ( x ; θ , σ , κ ) ) , i = 1 , 2 , 3.

Subsequently, we need to find the derivatives of these expressions with respect to β then evaluated at β = β 0 to obtain the matrix − Λ ( β 0 ) . The matrix − Λ ( β 0 ) generalizes the Fisher information matrix.

It will be reduced to this matrix if the score functions ψ i ( x ; β ) , i = 1 , 2 , 3 , are differentiable with respect to β . It is clear that the elements of Λ ( β 0 ) will have closed form expressions but are lengthy to display. To obtain E β 0 ( ψ i ( x ; θ , σ , κ ) ) , i = 1 , 2 , 3 , note that we have a location and scale parameter. Consequently, it appears to be simpler to define first the standardized AL density as the AL density with θ = 0 , σ = 1 , i.e.,

g ε ( x ; κ ) = 2 γ exp ( 2 2 δ x ) exp ( − 2 2 γ ( | x | ) ) and the AL density with three

parameters as

g ( x ; θ , σ , κ ) = 1 σ g ε ( x − θ σ ; κ ) .

Making use of g ε ( x ; κ ) ,

E β 0 ( v ( x ; θ ) ) = ∫ θ ∞ 1 σ 0 g ε ( x − θ 0 σ 0 ; κ 0 ) d x − ∫ − ∞ θ 1 σ 0 g ε ( x − θ 0 σ 0 ; κ 0 ) d x ,

or

E β 0 ( v ( x ; θ ) ) = 1 − 2 G ε ( θ − θ 0 σ 0 ; κ 0 ) , (17)

G ε ( x ; κ ) is the distribution function with density function g ε ( x ; κ ) .

Similarly,

E β 0 ( | x − θ | ) = ∫ θ ∞ ( x − θ ) g ε ( x − θ 0 σ 0 ; κ 0 ) d x + ∫ − ∞ θ ( θ − x ) 1 σ 0 g ε ( x − θ 0 σ 0 ; κ 0 ) d x . (18)

Therefore, ∂ E β 0 ( | x − θ | ) ∂ θ can be obtained by first evaluating the term

∂ ∂ θ ∫ θ ∞ ( x − θ ) 1 σ 0 g ε ( x − θ 0 σ 0 ; κ 0 ) d x = − ∫ θ ∞ 1 σ 0 g ε ( x − θ 0 σ 0 ; κ 0 ) d x = − [ 1 − G ε ( θ − θ 0 σ 0 ; κ 0 ) ] ,

using Leibnitz’s rule which taking into account the lower bound of the interval also depends on θ then subsequently evaluate using Leibnitz’s rule the expression

∂ ∂ θ ∫ − ∞ θ ( θ − x ) 1 σ 0 g ε ( x − θ 0 σ 0 ; κ 0 ) d x = G ε ( θ − θ 0 σ 0 ; κ 0 ) .

Consequently,

∂ E β 0 ( | x − θ | ) ∂ θ = − 1 + 2 G ε ( θ − θ 0 σ 0 ; κ 0 ) .

The elements of Λ ( β 0 ) can be found subsequently by first forming

λ 1 ( β ; β 0 ) = ∫ − ∞ ∞ ψ 1 ( x , β ) g ( x ; β 0 ) d x = − 2 2 δ σ − 2 2 γ σ E β 0 ( v ( x ; θ ) ) ,

E β 0 ( v ( x ; θ ) ) is as given by expression (17). Also,

λ 2 ( β ; β 0 ) = ∫ − ∞ ∞ ψ 2 ( x , β ) g ( x ; β 0 ) d x = − 1 σ − 2 2 δ σ 2 ( E β 0 ( x ) − θ ) + 2 2 γ σ 2 E β 0 ( | x − θ | ) ,

E β 0 ( | x − θ | ) is as given by expression (18), E β 0 ( x ) = θ 0 + τ 0 μ 0 with τ 0 = 1 using expression (4). With

λ 3 ( β ; β 0 ) = ∫ − ∞ ∞ ψ 3 ( x , β ) g ( x ; β 0 ) d x or equivalently,

λ 3 ( β ; β 0 ) = − ∂ γ ∂ κ γ + ∂ γ ∂ κ 2 σ ( E β 0 ( x ) − θ ) − 2 2 ∂ γ ∂ κ E β 0 ( | x − θ | ) ,

then the matrix Λ ( β 0 ) can be obtained by differentiating with respect to β the vector

λ ( β ; β 0 ) = ( λ 1 ( β ; β 0 ) , λ 2 ( β ; β 0 ) , λ 3 ( β ; β 0 ) ) ′ and set β = β 0 , i.e.,

Λ ( β 0 ) = ∂ λ ( β ; β 0 ) ∂ β ′ | β = β 0 .

Clearly, the elements of the matrix Λ ( β 0 ) have closed form expressions but are lengthy to display. Packages like MATLAB or Mathematica can handle symbolic derivatives and can be used to obtain these elements. Substituting β 0 by the ML estimator β ^ in Λ ( β 0 ) yields an estimate for the matrix Λ ( β 0 ) .

Now we turn our attention to the matrix ∑ which is the covariance matrix of the vector of score functions ψ ( x , β 0 ) = ( ψ 1 ( x , β 0 ) , ψ 2 ( x , β 0 ) , ψ 3 ( x , β 0 ) ) ′ . Using a different but equivalent parameterisation, this matrix has been obtained by Kotz et al. ( [

Note that the inverse of Σ is not the asymptotic covariance matrix of the ML

estimators is due to Λ ( β 0 ) = ∂ λ ( β ; β 0 ) ∂ β ′ | β = β 0 is not equal to − Σ − 1 if the differ-

rentiability assumptions for the score functions do not hold, see corollary (3.2) and proposition (3.3) given by Huber ( [

The matrix Σ can also be estimated by the following estimator

1 n ∑ i = 1 n [ ψ ( x i , β ^ ) ] [ ψ ( x i , β ^ ) ] ′ .

Let us consider the following location model with known σ 0 and check the expression (2.2) as given by Kotz et al. ( [

f ( x ; θ ) = 1 σ 0 2 e − 2 σ 0 | x − θ | , or alternatively the density can also be expressed as

f ( x ; θ ) = f 0 ( x − θ ) , f 0 ( x ) = 1 σ 0 2 e − 2 | x | σ 0 .

This subfamily will correspond to their parametrisation with κ = 1 in their paper. The sample median θ ^ is the ML estimator for θ , using their result it will

lead to conclude that the asymptotic variance is given by ( E ( ∂ ln f ∂ θ ) 2 ) − 1 = σ 0 2 2 ,

as indicated by case1 in the table of their paper. On the other hand, it is known that the asymptotic distribution of the sample median is given by

n ( θ ^ − θ 0 ) → L N ( 0 , 1 4 ( f 0 ( 0 ) ) 2 ) , see expression (2.4.19) given by Lehmann

( [

1 4 ( f 0 ( 0 ) ) 2 = 1 8 σ 0 2 . Clearly, 1 8 σ 0 2 ≠ σ 0 2 2 but the correct asymptotic variance can

be obtained using expression (13).

For the general GAL distribution with four parameters, alternative methods of estimation based on quadratic distances (QD) which make use of the empirical cumulant generating function will be introduced in the next section. The QD

methods are developed based on empirical findings which show that the ML methods for finite sample sizes as large as n = 5000 do not give good estimates for the shape parameter τ and the scale parameter σ but ML methods give good estimates for the other two parameters. Howewer, the overall efficiency of ML methods lags behind QD methods in finite samples. Also, QD methods beside giving better estimates for σ and τ , the methods can be used for parameter testing since the asymptotic covariance matrix for the QD estimators can be obtained explicitly for the entire GAL family. The methods also provide a chi- square test statistics of goodness-of-fit for the model being used. Therefore, it might be of interests to consider using QD methods whenever ML methods might have deficiencies.

General Quadratic distance (QD) theory has been developed in Luong and Thompson [

For financial data, observations are recorded as percentages so they are small in magnitude, we recommend minimizing the following distance based on matching the empirical cumulant generating function K n ( t ) with its model counterpart K β ( t ) using the following points

t j , j = 1 , ⋯ , m = 20

with

t 1 = 0.01 , t 2 = 0.02 , ⋯ , t 10 = 0.1 , t 11 = − 0.01 , t 12 = − 0.02 , ⋯ , t 20 = − 0.1. (19)

The choice of points as given above is suggested based on empirical findings that overall, the QD estimators are more efficient than the ML estimators for the range of parameters often encountered for modelling financial data using finite sample sizes as large as n = 5000. Note that the set of points chosen does not include the origin 0.

The empirical moment generating function, empirical cumulant generating function are given respectively by

M n ( s ) = 1 n ∑ i = 1 n e s X i and K n ( s ) = log M n ( s ) .

The model cumulant generating function is K β ( t ) , K β ( t ) = log M β ( t ) with M β ( s ) being the model moment generating function as defined by expression (1). The proposed QD estimators given by the vector β ˜ is obtained by minimizing with respect to β the following specific QD distance given by

D ( β ) = ∑ j = 1 20 ( K n ( s j ) − K β ( s j ) ) 2 . (20)

Once the estimates are obtained, goodness of fit test statistics with an asymptotic chi-square distribution with r = 16 degree of feedom can also be constructed. General QD distances theory can be used to derive the asymptotic covariance matrix of the QD estimators and the chi-square goodness of fit test statistics. They will be given at the end of this section. Having the asymptotic covariance matrix of the QD estimators in closed form for the GAL family is useful for parameter testing.

For notations, let us define the vector based on observations

z n = ( K n ( t 1 ) , ⋯ , K n ( t m ) ) ′ , m = 20 .

Its model counterpart is the vector

z β = ( K β ( t 1 ) , ⋯ , K β ( t m ) ) ′ .

Therefore,

D ( β ) = ( z n − z β ) ′ ( z n − z β ) .

Observe that the elements of the covariance matrix V M for the vector

n ( M n ( t 1 ) , ⋯ , M n ( t m ) ) ′ are given by

V M ( i , j ) = M β ( t i + t j ) − M β ( t i ) M β ( t j ) , i = 1 , ⋯ , 20 , j = 1 , ⋯ , 20.

The elements of the approximate covariance matrix based on the differential method or delta method for

n ( K n ( t 1 ) , ⋯ , K n ( t m ) ) ′

are given by

V K ( i , j ) = ( M β ( t i + t j ) − M β ( t i ) M β ( t j ) ) / ( M β ( t i ) M β ( t j ) ) , i = 1 , ⋯ , 20 , j = 1 , ⋯ , 20.

Under the regularity conditions given by Lemma (3.4.1) of Luong and Thompson ( [

n ( β ˜ − β 0 ) → L N ( 0 , V ) , V = ( S ′ S ) − 1 S ′ V K S ( S ′ S ) − 1 .

The asymptotic covariance for the QD estimators is simply 1 n V .

All the expressions which form V as given above are evaluated under the true vector of parameters β 0 , β = ( θ , μ , σ , τ ) ′ = ( β 1 , β 2 , β 3 , β 4 ) ′ and

S = [ ∂ K β ( t 1 ) ∂ β 1 ⋯ ∂ K β ( t 1 ) ∂ β 4 ⋮ ⋱ ⋮ ∂ K β ( t m ) ∂ β 1 ⋯ ∂ K β ( t m ) ∂ β 4 ] , S ′ is the transpose of S .

We also use S = S ( β 0 ) , V K = V ( β 0 ) , Σ 2 = Σ 2 ( β 0 ) to emphasize that these matrices depend on β 0 . The matrix Σ 2 is derived below. For constructing test statistics with chi-square limiting distribution, use expression (3.4.2) given by Luong and Thompson ( [

n ( z n − z β ˜ ) → L N ( 0 , Σ 2 ) with Σ 2 , a covariance matrix which depends on β 0 and

Σ 2 = [ I − S ( S ′ S ) − 1 S ′ ] V K [ I − S ( S ′ S ) − 1 S ′ ] . (21)

In practice, β 0 needs te be replaced by β ˜ so that an estimate of Σ 2 can be defined as

Σ ˜ 2 = Σ 2 ( β ˜ ) .

We need to find the Moore-Penrose (MP) generalized inverse for Σ ˜ 2 to constructa chi-square statistics. The quadratic form constructed with the MP inverse will follow a chi-square distribution asymptotically. Many computer packages provide prewritten functions to find the Moore-Penrose inverse of a matrix. It can also be computed easily using the spectral decomposition of Σ ˜ 2 , i.e., using the representation Σ ˜ 2 = P D P ′ . The columns of the matrix P are the eigenvectors of Σ ˜ 2 and D is a diagonal matrix with the diagonal elements being the corresponding eigenvalues of Σ ˜ 2 given respectively by λ i ≥ 0 , i = 1 , ⋯ , m . The matrix P is orthonormal with the property P P ′ = I .

The Moore Penrose inverse Σ ˜ 2 M P can be obtained as

Σ ˜ 2 M P = P D − P ′ with

D − being the diagonal matrix constructed based on the diagonal elements λ i , i = 1 , ⋯ , 20 of D . The diagonal elements of D − are given as

λ i − = 1 λ i if λ i > 0 and λ i − = 0 if λ i = 0 .

For discussions on property of the Moore Penrose generalized inverse,

see Theil ( [

Q ( β ) = n ( z n − z β ) ′ ( P D − P ′ ) ( z n − z β ) , (22)

Q ( β ˜ ) = n ( z n − z β ˜ ) ′ ( P D − P ′ ) ( z n − z β ˜ ) → L χ 2 ( 16 ) . (23)

The limiting distribution of the test statistics is chi-square with r = 16 , based on Theorem 3.4.1 of Luong and Thompson ( [

The simple approximate moment estimate proposed by Senata [

μ ^ j = 1 n ∑ i = 1 n ( X i − X ¯ ) j , j = 1 , ⋯ , 4 and equalizing with the model counterparts

and neglecting all the terms with θ ′ j , j = 2 , 3 , 4 yields the following system of estimating equation for moment estimation,

μ ^ 1 = c + θ ′ , μ ^ 2 = σ ′ 2 , μ ^ 3 = 3 σ ′ 2 θ ′ ν , μ ^ 4 = 3 σ ′ 4 ν + 3 σ ′ 4 . The moment estimators are

( σ ¯ ′ ) 2 = μ ^ 2 , ν ¯ = μ ^ 4 3 − ( σ ¯ ′ ) 4 ( σ ¯ ′ ) 4 , θ ¯ ' = μ ^ 3 3 ν ¯ ( σ ¯ ′ ) 2 , c ¯ = μ ^ 1 − θ ¯ ' . When converted to the

parameterization given by Kotz et al. [

for τ , σ 2 , μ , θ are given respectively as τ ¯ = 1 ν ¯ , σ ¯ 2 = σ ¯ ′ 2 ν ¯ , μ ¯ = θ ¯ ′ ν ¯ , θ ¯ = c ¯ . The

approximate moment estimators are not efficient but they are simple and given explicitly .Therefore, they can be used as starting points for the numerical algorithms to implement QD or ML estimation. Moment estimators can also be verified to see whether they are appropriate as starting points. This will be discussed in the next section.

Most of the algorithms will return a local minimizer and the vector which gives the estimators is defined to be the global minimizer. Due to this limitation, some cares are needed to ensure that we can identify the global minimizer. In practice, it is important to test the algorithm with various starting vectors, see Andrews [

β ( 0 ) = ( θ ( 0 ) , μ ( 0 ) , σ ( 0 ) , τ ( 0 ) ) ′ close to the vector of the estimators given by β ˜

which globally minimizes the objective function. We might look for a different starting vector if the vector of moment estimators cannot be used as a starting vector to initialize the numerical algorithm.

The criterion function Q ( β ) given by expression (22) which is used to construct goodness of fit test can also be used to select a good starting vector. The starting vector β ( 0 ) is subject to the screening test by checking whether

Q ( β ( 0 ) ) ≤ χ 0.95 2 ( 16 ) ,

χ 0.95 2 ( 16 ) is the 95th percentile of the chi-square distribution with 16 degree of freedom to be qualified as a suitable starting vector, see expression (3.5) given by Andrews ( [

For financial data, observations are recorded as percentages so they are small in magnitude. We are in the situation of modeling with values for θ and μ are near 0. The plausible values for τ and σ , 0 < τ ≤ 10 , 0 < σ ≤ 0.1 . For parameters with these ranges we observe that the ML estimators for τ and σ do not perform well for sample size as large as n = 5000 . For comparisons between QD methods vs ML methods, the ratio of total Mean square errors is used as a measure for the overall relative efficiency. Due to the limited capacity on computing as we only have access to a laptop computer, we can only use M = 100 samples with each sample is of a size n = 5000.

The overall relative efficiency for comparisons is defined as the ratio

TMSE ( Q D ) TMSE ( M L ) = MSE ( θ ˜ ) + MSE ( μ ˜ ) + MSE ( σ ˜ ) + MSE ( τ ˜ ) MSE ( θ ^ ) + MSE ( μ ^ ) + MSE ( σ ^ ) + MSE ( τ ^ ) .

The expressions for MSE and TMSE which appear in

The study seems to indicate that overall ML methods are less efficient than QD methods but ML methods are more efficient for estimating the first two