_{1}

^{*}

Maximum entropy likelihood (MEEL) methods also known as exponential tilted empirical likelihood methods using constraints from model Laplace transforms (LT) are introduced in this paper. An estimate of overall loss of efficiency based on Fourier cosine series expansion of the density function is proposed to quantify the loss of efficiency when using MEEL methods. Penalty function methods are suggested for numerical implementation of the MEEL methods. The methods can easily be adapted to estimate continuous distribution with support on the real line encountered in finance by using constraints based on the model generating function instead of LT.

Nonnegative continuous parametric families of distributions are useful for modeling loss data or lifetime data in actuarial sciences. Many of these families do not have closed form densities. The densities can only be expressed using means of infinite series representations but their corresponding Laplace transforms (LT) have closed form expressions and they are relatively simple to handle. An illustration is given below.

Example

Hougaard [

φ β ( s ) = E ( e − s X ) = ∫ 0 ∞ e − s f β ( x ) d x = e − δ α s α , 0 < α < 1 , δ > 0 , β = ( δ , α ) ′ .

The density function f β ( x ) has no closed form but can be represented as an infinite series.

Now if we create a new distribution using the Esscher transform technique, the corresponding new density can be expressed using f β ( x ) and is given by

e − θ x f β ( x ) φ β ( θ ) ,

and its LT is

L ( s ) = φ β ( s + θ ) φ β ( θ ) .

This operation adds an extra tilted parameter θ to the vector of parameter β of the original distribution and a new distribution is created. This new distribution is the positive tempered stable (PTS) distribution with Laplace transform given by

L ( s ) = E ( e − s X ) = exp ( − δ α [ ( θ + s ) α − θ α ] ) , s > 0 , θ > 0 , 0 < α < 1 , δ > 0. (1)

The first four cumulants are given by Hougaard [

c 1 = δ θ α − 1 , c 2 = δ ( 1 − α ) θ α − 2 , c 3 = δ ( 1 − α ) ( 2 − α ) θ α − 3 , c 4 = δ ( 1 − α ) ( 2 − α ) ( 3 − α ) θ α − 4 .

For the limiting case α → 0 + we have the gamma distribution. In general,

the density function has no closed form except for α = 1 2 . For α = 1 2 we ob-

tain the inverse Gaussian (IG) distribution with density function given by Hougaard ( [

f ( x ; θ , α ) = δ x 3 2 π exp ( 2 δ θ 1 2 ) exp ( − θ x − δ 2 x ) , x ≥ 0.

For other parameterisation for the IG distribution, see Panjer and Wilmott ( [

Hougaard [

Many new infinitely distributions (ID) can be created using operations on LT based on existing distributions. One of them is the power mixture (PM) operator, see Abate and Whitt ( [

η ( s ) = E P ( κ , H ) = ∫ 0 ∞ ( κ ( s ) ) t d H ( t ) = κ H ( − log ( κ ( s ) ) ) . (2)

The new distribution is created using the power mixture (PM) operator. The PM operator was introduced by Abate and Whitt ( [

The new distribution obtained will have more parameters than the distribution of X . For other methods, such as compounding methods for creating new distributions, see Klugman et al. ( [

Definition (Lévy-Khintchine Representation). A characteristic function (CF) ω ( s ) of a random variable X is infinitely divisible if and only if it can be represented as

ω ( v ) = exp ( i b v + ∫ − ∞ ∞ ( e i x v − 1 − i v x 1 + x 2 ) ( 1 + x 2 x 2 ) d G ( x ) ) ,

G ( x ) is a bounded and non-decreasing function with G ( − ∞ ) = 0 , see Rao [

For statistical inferences, we assume that we have a random sample with n observations, X 1 , ⋯ , X n . These observations are independent and identically distributed as X which has a model distribution with closed form LT L β ( s ) , β = ( β 1 , ⋯ , β p ) ′ is the vector of parameters. The true parameter vector is denoted by β 0 . The density function f β ( x ) has no closed form which makes likelihood estimation difficult to implement. Consequently, we would like to estimate β based on L β ( s ) . Quasi-likelihood among other methods which do not rely on the true density can be considered. A brief review of QL estimation is given below.

Godambe and Thompson [

∂ log f β ( x ) ∂ β

on the linear spanned space by the basis

B Q = { g 1 ( x ) = h 1 ( x ) − E ( x ) , g 2 ( x ) = h 2 ( x ) − E ( x 2 ) } , h 1 ( x ) = x , h 2 ( x ) = x 2 .

Note that E ( x ) , E ( x 2 ) can be obtained based on the model Laplace transform denoted by L β ( s ) . See Chan and Ghosh [

The EQEF estimators are simple to obtain since the basis B Q has only two elements. The fact they are based on best approximations of the score functions allow them to outperform moment estimators in many circumstances.

For example, we can consider a parametric model with 3 parameters which leads to solve the moment equations, i.e.,

1 n ∑ i = 1 n ( x i − E ( x ) ) = 0 , 1 n ∑ i = 1 n ( x i 2 − E ( x 2 ) ) = 0 , 1 n ∑ i = 1 n ( x i 3 − E ( x 3 ) ) = 0.

It is easy to see that the quasi-score functions of moment methods belong to the linear space spanned by the basis

B m o m = { x − E ( x ) , x 2 − E ( x 2 ) , x 3 − E ( x 3 ) } .

Even that B m o m includes all the elements of B Q , the EQEF methods can outperform the method of moments due to the quasi-score functions of the methods of moments are not the best approximations based on B m o m .

Therefore, in this paper we shall emphasize quasi-score functions which make use of best linear combinations of elements of a basis to produce quasi score functions and propose some bases that can provide better efficiencies than the basis formed by linear and quadratic polynomials. The basis should only make use of the model LT. Note that moment estimators based on selected points of the model LT have been discussed in the literature, see Read ( [

We shall approach in a unified way so that both QL methods and the MEEL methods are related to the notion of projection of the true score functions on a linear space spanned by a finite base. Within this framework, MEEL estimators are shown to be asymptotically first order equivalent to QL estimators using the same base. For the first and higher order properties of empirical likelihood estimators, see Newey and Smith [

MEEL methods use of informations from the parametric model via constraints and there is one to one correspondence between the constraints, moment conditions and the elements of the basis. Despite that general theory of MEEL or QL methods is well established, the question which bases should we choose to achieve good efficiencies appears to be a relevant one for applications. There is a need to quantify the loss of efficiency as well and consequently in this paper, we also propose a measure of loss of efficiency to evaluate whether MEEL are appropriate methods for analyzing a data set from a specific field of applications.

We hope that the answers will give ideas on how to choose moment conditions or constraints for MEEL estimators. It will also give ideas how to construct semi-parametric bounds as defined by Chamberlain ( [

Let l i = l i ( β ) = ∂ ∂ β i log f ( x ; β ) , i = 1 , ⋯ , p be the score functions and note that

E ( l i ) = 0 in general. If we try to approximate l i ( β ) using a quasi-score function formed by linear combinations of the functions h 1 ( x ) , ⋯ , h k ( x ) , this leads to consider quasi-score functions of the form

l i a = a 0 + ∑ j = 1 k a i j h j ( x ) , i = 1 , ⋯ , p .

We shall also impose the condition of unbiasedness of estimating function by requiring E ( l i a ) = 0 , i = 1 , ⋯ , p . With these restrictions, it is equivalent to consider

l i a = ∑ j = 1 k a i j ( h j ( x ) − E β ( h j ( x ) ) ) , i = 1 , ⋯ , p .

Using vector notations,

l i a = a ′ i g ,

g ( x ) = ( h 1 ( x ) − E β ( h 1 ( x ) ) , ⋯ , h k ( x ) − E β ( h k ( x ) ) ) ′

and define

h ( x ) = ( h 1 ( x ) , ⋯ , h k ( x ) ) ′ .

For the best approximation by projecting l i ( β ) on the linear space spanned by the basis B = { h 1 ( x ) − E β ( h 1 ( x ) ) , ⋯ , h k ( x ) − E β ( h k ( x ) ) } , we look for the vector of coefficients a i * ′ s which minimizes

E ( l i − a ′ i g ) 2 ,

E ( . ) = E β ( . ) , the expectation is taken under f β ( x ) .

Using results of the proof of Theorem 4.1 given by Luong and Doray ( [

a i * ′ = E β ( ∂ g ′ ∂ β i ) Σ − 1 , i = 1 , ⋯ , p (3)

Σ is the covariance matrix of h ( x ) under f β ( x ) . We also use the notation Σ = Σ ( β ) if an emphasis on the dependence on β is needed.It is easy to see that the best approximation is given by

l i = l i * ( β ) = a i * ′ ( β ) g ( β ) , i = 1 , ⋯ , k . .

Note that the elements of a i * need to be spelled out explicitly which means that the covariance matrix Σ needs to be known or estimated for applying quasi-likelihood estimation. MEEL estimation does not need this feature and yet produces asymptotic equivalent estimators. This is one of the main advantages of MEEL estimation over QL estimation.

Quadratic distance (QD) estimation as given by Luong and Thompson [

Let u n be a vector defined based on observations,

u ′ n = ( ∫ 0 ∞ h 1 d F n , ⋯ , ∫ 0 ∞ h k d F n ) .

Its model counterpart is given by

u ′ β = ( ∫ 0 ∞ h 1 d F β , ⋯ , ∫ 0 ∞ h k d F β )

where F n is the sample distribution function and its model counterpart is denoted by F β .

The QD estimators β Q ^ are obtained by minimizing the quadratic form defined as

Q ( β ) = ( u n − u β ) ′ Σ − 1 ( u n − u β ) .

The equivalent quadratic form is

Q ( β ) = ( u n − u β ) ′ Σ ^ − 1 ( u n − u β ) , (4)

Σ ^ is a consistent estimate of Σ ( β ) under the true vector of parameters β 0 . One can see that this procedure is equivalent to use quasi-score functions obtained by projecting the true score functions on the linear space spanned by B since minimizing Expression (3) leads to solve for β the system of equations

∑ j = 1 n E β ( ∂ g ′ ∂ β i ) Σ ^ − 1 g ( x i , β ) = 0 , i = 1 , ⋯ , p ,

E β ( ∂ g ′ ∂ β i ) = − ∂ E β ( h ′ ) ∂ β i , i = 1 , ⋯ , p .

Observe that the vector a ^ i * ′ = E β ( ∂ g ′ ∂ β i ) Σ ^ − 1 is equivalent to a i * ′ .

From results in Luong and Thompson ( [

n ( β Q ^ − β 0 ) → L N ( 0 , V ) .

V = ( S ′ Σ − 1 S ) − 1 (5)

The matrix S ′ can be expressed as

S ′ = [ ∂ E β ( h 1 ) ∂ β 1 … ∂ E β ( h k ) ∂ β 1 ⋮ ⋱ ⋮ ∂ E β ( h 1 ) ∂ β p … ∂ E β ( h k ) ∂ β p ] .

The elements of the matrix V are evaluated under β = β 0 , S ′ is the transpose of S .

Following Morton ( [

I B ( β ) = S ′ ( β ) Σ − 1 ( β ) S ( β ) (6)

can be defined as the information matrix of the vector of optimum quasi-score functions and it is related to the semiparametric bounds using the moment conditions as given by Chamberlain ( [

Despite QL and MEEL methods generate asymptotic equivalent estimators, there are reasons to consider MEEL methods rather than quasi-likelihood me- thods.

With MEEL methods, we have the following main advantages:

1) The matrix Σ which depends on β in general needs to be specified explicitly which might restrict elements to be included in the basis. We can only include elements with relative simple form for their covariances, otherwise Σ will be complicated.

2) If Σ is replaced by a consistent estimate Σ ^ under β 0 , the estimate is often not accurate enough especially when the sample size n is not large enough and therefore, Σ ^ tends to be nearly singular even with a few elements in the basis, this creates numerical instability when applying QD methods or quasi- likelihood methods.

3) Goodness of fit test statistics with limiting chi-square distributions for testing the model can be constructed in a unified way with MEEL methods. This feature is not shared by QL methods.

Within the class of empirical likelihood methods, the MEEL methods are numerically more stable than the original empirical likelihood methods (EL) which were first introduced by Owen [

The paper is organized as follows. The choice of bases for generating constraints for the MEEL methods is examined in Section 2. Two families of bases using LT are presented in this section. These two families of bases appear to be useful for actuarial applications. In Section 3, we review asymptotic properties of MEEL methods. An estimate for overall relative efficiency using Fourier cosine series expansion is proposed to quantify the loss of overall efficiency when MEEL methods are used. In Section 4 we examine numerical issues and penalty function methods are advocated to locate the global minimizer which gives the MEEL estimators. Simulations are discussed in Section 5. The simulation study from the positive tempered stable distribution shows that the MEEL estimators are much more efficient than moment estimators originally proposed by the seminal paper of Hougaard [

Using results in Section 1.2, we consider the basis B which can be used for nonnegative continuous distribution or nonnegative distribution with a discontinuity point at the origin with mass assigned to it. The basis B will have the form

B = { x − E ( x ) , x 2 − E ( x 2 ) , e − τ x − L β ( τ ) , ⋯ , e − m τ x − L β ( m τ ) } . (7)

We observe that the number of elements in the basis is k = m + 2 and the elements can be obtained using the LT of the model and therefore suitable for estimation for parametric continuous distribution with density without a closed form expression.

The number of elements in basis B is finite. It is formed based on the completeness property of the following basis with an infinite number of elements,

{ e − τ x − L β ( δ ) , e − 2 τ x − L β ( 2 τ ) , ⋯ } . (8)

This infinite basis can be traced back to the work of Zakian and Littlewood [

The following example will make it clear the notion of restricted parameter spaces. For example, we have a model with two parameters given by θ 1 ≥ 0 , θ 2 ≥ 0 . On a restricted parameter space, the parameters are subject to stricter in equalty bounds. For example 0 < a ≤ θ 1 ≤ b and 0 < c ≤ θ 2 ≤ d with a , b , c , d are finite positive real numbers.

Therefore, in practice we might want to fix m = 10 and τ = 0.01 , i.e., let

B = { x − E ( x ) , x 2 − E ( x 2 ) , e − 0.01 x − L β ( 0.01 ) , ⋯ , e − 0.1 x − L β ( 0.1 ) } . (9)

The basis B as indicated above often gives a good balance between numerical simplicity and efficiencies of the estimators.

If the model density has no discontinuity at all then the following basis with negative power moment elements can be considered and we shall see negative power moments can be recovered using the LT. Using the result given in lemma 1 given by Brockwell and Brown ( [

{ x − τ − E ( x − τ ) , x − 2 τ − E ( x − 2 τ ) , ⋯ } .

Therefore, the following finite basis

C = { x − E ( x ) , x 2 − E ( x 2 ) , x − τ − E ( x − τ ) , x − 2 τ − E ( x − 2 τ ) , ⋯ , x − m τ − E ( x − m τ ) } (10)

can also be considered.

The elements of a basis should respect the regularity conditions of Assumption of Section (3.2) for the estimators to be consistent and have an asymptotic normal distribution. The following example will illustrate this point. In practice for example if E ( x − 1 ) exists and lower negative power moments do not exist, we might want to choose C to be

C = { x − E ( x ) , x 2 − E ( x 2 ) , x − τ − E ( x − τ ) , x − 2 τ − E ( x − 2 τ ) , ⋯ , x − m τ + h − E ( x − m τ + h ) } , (11)

m = 5 , τ = 0.1.

The last element is special as it involves h which can be set equal to some small positive value, for example let h = 0.01 for the regularity condition 3) of Assumption 1 to be met. Obviously, if E ( x − 2 ) exists then we can let h = 0 .

Now we shall state a proposition which relates negative power moments of a distribution to its LT. The results given by the following proposition are more general than results given by Cressie et al. [

Proposition

Suppose that X is a nonnegative continuous random variable with density function and Laplace transform given respectively by f ( x ) and L ( s ) then if

E ( x − u ) exists, it is given by E ( x − u ) = 1 Γ ( u ) ∫ 0 ∞ s u − 1 L ( s ) , u > 0 , Γ ( u ) is the

commonly used gamma function, assuming the integral exists.

Proof.

Observe that ∫ 0 ∞ s u − 1 L ( s ) d s = ∫ 0 ∞ ( ∫ 0 ∞ s u − 1 e − s x d s ) f ( x ) d x by switching the or-

der of integration and note that the inner integral can be expressed as

∫ 0 ∞ s u − 1 e − s x d s = x − u Γ ( u ) , u > 0 , ,

using properties of a gamma distribution. The integral ∫ 0 ∞ s u − 1 L ( s ) , u > 0 if it exists can be evaluated numerically. Most of computer packages provide built- in functions to evaluate these integrals numerically.For the positive stable distribution or gamma distribution negative power moments have closed form expressions, see Luong and Doray ( [

MEEL methods as discussed in chapter 13 by Mittelhammer et al. ( [

Assume that we have a random sample as in Section 1.2. The vector β has p components, i.e.,

β = ( β 1 , ⋯ , β p ) ′ .

We are in the situation where the density f β ( x ) has no closed form expression but using the LT, we can extract k moments of the original parametric model, ( E β ( h 1 ( x ) ) , ⋯ , E β ( h k ( x ) ) ) ′ assuming k > p .

Clearly, the sample distribution function corresponds to a discrete distribu-

tion which assigns the mass p i n = 1 n at the realized point of the observation

i = 1 , ⋯ , n and ∑ i = 1 n p i = 1 . Now instead of using the original model for inferences we shall consider proxy discrete models with mass function π i ( β ) assigning mass at the realized point of observations. Let

p n = ( p 1 n , ⋯ , p n n ) ′ = ( 1 n , ⋯ , 1 n ) ′ ,

π = π ( β ) = ( π 1 , ⋯ , π n ) ′ ,

The Kullback-Leibler distance between the two discrete distributions p n and π is defined by the following measure of discrepancy,

K L ( π , p n ) = ∑ i = 1 n π i ( ln π i − ln ( 1 n ) ) .

We also require the proxy model beside satisfying the basic requirement, i.e., ∑ i = 1 n π i = 1 , π i ≥ 0 , it also satisfies the same moment conditions of the original parametric model, i.e.,

E π ( h j ( x ) ) = E β ( h j ( x ) ) , j = 1 , ⋯ , k ,

E π ( h j ( x ) ) = ∑ i = 1 n π i h j ( x ) .

Parametric estimation will be carried out in two stages. The first stage is to choose the best proxy model by minimizing K L ( π , p n ) which is equivalent to maximize the entropy measure with the above constraints. It leads to maximize − ∑ i = 1 n π i ln π i or equivalently minimize

∑ i = 1 n π i ln π i (12)

subject to the constraints given by

∑ i = 1 n π i = 1 , (13)

∑ i = 1 n π i ( g j ( x i ; β ) ) = 0 , j = 1 , ⋯ , k (14)

with

g j ( x i ; β ) = h j ( x i ) − E β ( h j ( x ) ) , j = 1 , ⋯ , k . (15)

Mittelhammer et al. ( [

L ( π , λ , μ ) = ∑ i = 1 n π i ln π i + ∑ j = 1 k λ j ( ∑ i = 1 n π i g j ( x i ; β ) ) + μ ( ∑ i = 1 n π i − 1 )

with λ j , j = 1 , ⋯ , k and μ are Lagrange multipliers. Taking partial derivatives with respect to π i leads to the system of equation

∂ L ∂ π i = ln π i + 1 + ∑ j = 1 k λ j ( g j ( x i ; β ) ) + μ = 0 , i = 1 , ⋯ , n . (16)

The solutions of the equation yield the best discrete proxy model with mass function given by

π i * = exp ( − ∑ j = 1 k λ j ( β ) ( g j ( x i ; β ) ) ) ∑ i = 1 n exp ( − ∑ j = 1 k λ j ( β ) ( g j ( x i ; β ) ) ) , i = 1 , ⋯ , n (17)

which is Expression (13. 2.6) given by Mittelhammer et al. ( [

Note that since the π i * ′ s are defined implicitly, they depend on β but do not depend on the Lagrange multiplier μ as it is easy to see that we already have ∑ i = 1 n π i * = 1 and π i * ≥ 0 , i = 1 , ⋯ , n .

Let π * = ( π 1 * , ⋯ , π n * ) ′ , λ = ( λ 1 , ⋯ , λ k ) ′

The second stage is to use the KL distance for parametric inferences. At this stage, we minimize with respect to β the expression

∑ i = 1 n π i * ( β ) ln π i * ( β ) (18)

to obtain the MEEL estimators β ^ .

The numerical procedures to implement MEEL methods appear to be complicated as the π i * ′ s are defined implicitly. Numerical procedures are simplified by using penalty function methods and will be discussed in Section 4. With this approach, it suffices to perform unconstrained minimization with respect to k + p variables given by λ j , j = 1 , ⋯ , k , β i , i = 1 , ⋯ , p with k > p using a suitably defined objective function. Therelationships between the vector λ and the vector β are given by

∑ i = 1 n π i * ( λ , β ) ( g j ( x i ; β ) ) = 0 , j = 1 , ⋯ , k (19)

will be used to build the penalty function part of the new objective function.

Imbens ( [

The identification of the global minimizer in nonlinear estimation which gives the estimates is an important one as most of the algorithms only give local minimizers and therefore are vulnerable to starting points used to initialize the algorithm, see Davidson and McKinnon ( [

The regularity conditions for the MEEL estimators β ^ to be consistent and to follow an asymptotic normal distribution have been given by Assumption 1 in Schennach ( [

Assumption

Assume that:

1) The true parameter given by the vector β 0 is an interior point of the parametric space θ which is assumed to be compact.

2) β 0 is the unique vector which satisfies E β 0 ( g ( x , β 0 ) ) = 0 .

3) g ( x , β ) is differentiable with respect to β and E β 0 ( sup β | g i | 2 + h ) < ∞ for some h > 0 , i = 1 , ⋯ , k .

4) The derivatives of g ( x , β ) , ∂ g i ∂ β j , i = 1 , ⋯ , k , also satisfy the local bounde- ness condition E β 0 ( sup β | ∂ g i ∂ β j | 2 + δ ) < ∞ for some δ > 0 when β is restricted to some neighbor- hood of β 0 .

5) The covariance matrix Σ of g ( x , β ) has rank k .

Under Assumption 1, then the MEEL estimators given by the vector β ^ is consistent and have a multinormal asymptotic distribution, β ^ → p β 0 , β 0 is the vector of the true parameters, n ( β ^ − β 0 ) → p N ( 0 , Ω ) ,

Ω = [ E [ ∂ g ( x , β ) ∂ β | β = β 0 ] ( E [ g ( x , β ) g ( x , β ) ′ ] | β = β 0 ) − 1 E [ ∂ g ( x , β ) ∂ β ′ | β = β 0 ] ] − 1 , (20)

g ( x , β ) = ( g 1 ( x ; β ) , ⋯ , g k ( x ; β ) ) ′ , Σ ( β 0 ) = E [ g ( x , β ) g ( x , β ) ′ ] | β = β 0 .

An estimator Ω ^ for Ω can be defined,

Ω ^ = [ [ ∑ i = 1 n π ^ i ∂ g ( x i , β ) ∂ β | β = β ^ ] ( ∑ i = 1 n π ^ i g ( x i , β ) g ( x i , β ) ′ | β = β ^ ) − 1 [ ∑ i = 1 n π ^ i ∂ g ( x i , β ) ∂ β ′ | β = β ^ ] ] − 1 ,

π ^ i = π i * ( β ^ ) , i = 1 , ⋯ , n . If we let π ^ i = 1 n , i = 1 , ⋯ , n ,we have another consistent estimator for Ω .

Note that Ω is identical to Expression (5) which shows the asymptotic equivalence between optimum quasi-likelihood estimation and MEEL estimation. Both methods do not need full specifications of the model but only require moment conditions of the true model.

The use of the KL distance also allows construction of a goodness-of-fit test statistics which follows an asymptotic chi-square distribution. The validity of the original model is reduced to the validity of moment conditions, we might want to test the null hypothesis specified as H 0 : E β ( h j ( x ) ) , j = 1 , ⋯ , k , the expectations are under the true parametric model.

The following test statistics given below is a chi-square test statistics with r = k − p degree of freedom, i.e.,

2 n K L ( π * ( β ^ ) , p n ) = 2 n ( ∑ i = 1 n π i * ( β ^ ) [ ln π i * ( β ^ ) − ln ( 1 n ) ] ) → L χ 2 ( k − p ) . (21)

It is clear that only under special circumstances that MEEL methods are as efficient as ML methods due to the use of a finite basis. This can only happen when the true score functions belong to the linear space spanned by a finite basis. Therefore, it appears to be useful to be able to quantify the loss of efficiency when using MEEL methods despite the model density has no closed form expression to check whether MEEL methods are appropriate for a specific field of applications. Fourier series expansion can be useful to approximate the density function and will be introduced below.

The density function can be expanded using Fourier cosine series in the range 0 < x < b , see Expressions (7-11) given by Fang and Osterlee ( [

f β ( x ) ~ f β a ( x ) , f β a ( x ) = F 0 ( β ) + ∑ j = 1 ∞ F j ( β ) cos ( j π b x ) .

The coefficients F j ( β ) , j = 0 , 1 , 2 , ⋯ are Fourier coefficients,

F 0 ( β ) = 1 b ∫ 0 b cos ( j π b x ) f β ( x ) d x , j = 0 ,

F j ( β ) = 2 b ∫ 0 b cos ( j π b x ) f β ( x ) d x , j = 1 , 2 , ⋯ .

Regularity conditions for uniform convergence of Fourier series are also given by Powers ( [

F ′ 0 , β l ( β ) = 1 b ∫ 0 b cos ( j π b x ) ∂ f β ( x ) ∂ β l d x

F ′ j , β l ( β ) = 2 b ∂ ∂ β l ( ∫ 0 b cos ( j π b x ) f β ( x ) d x ) , l = 1 , ⋯ , p , j = 1 , 2 , ⋯

If b is chosen sufficiently large, we have the following approximations of the coefficients using either the characteristic function (CF) or LT,

F j ( β ) ≈ F ¯ j ( β ) = 2 b ∫ 0 ∞ cos ( j π b x ) f β ( x ) d x = 2 b R e ( L β ( − i j π b ) ) , j = 1 , 2 , ⋯

and F ¯ 0 ( β ) = 1 b . Similarly,

F ′ j , β l ( β ) ≈ F ′ ¯ j , β l = 2 b ∂ β l R e ( L β ( − i j π b ) ) , F ′ ¯ 0 , β l ( β ) = 0 , l = 1 , ⋯ , p , j = 1 , 2 , ⋯ , M ,

R e ( ... ) is the real part of the complex number inside the parenthesis and most of the computer packages can handle complex numbers computations. In practice, we can only use a finite cosine series expansion with M terms. The formulas for the coefficients given by Fang and Osterlee ( [

∂ log f ¯ a ( x ) ∂ β l = ∂ f ¯ a ( x ) ∂ β l f ¯ a ( x ) , l = 1 , ⋯ , p

with

f ¯ a ( x ) = 1 b + ∑ j = 1 M F ¯ j ( β ) cos ( j π b x ) ,

∂ f ¯ a ( x ) ∂ β l = ∑ j = 1 M F ¯ ′ j , β l cos ( j π b x ) , l = 1 , ⋯ , p .

Therefore, if β is estimated by β ^ , the Fisher information matrix I ( β ^ ) can be estimated by I ^ ( β ^ ) using the original sample or simulated samples from the distribution with β = β ^ . If the original sample is used,

I ^ ( β ^ ) = 1 n ∑ i = 1 n ( ∂ ln f ¯ a ( x i ) ∂ β ) ( ∂ ln f ¯ a ( x i ) ∂ β ) ′ | β = β ^ .

The estimate overall relative efficiency can be defined based on Expression (20) as

A R E ( β ^ ) = det ( U ^ ( β ^ ) ) det ( I ^ ( β ^ ) ) , (22)

U ^ ( β ^ ) = S ′ ( β ^ ) Σ ^ − 1 S ( β ^ ) , det(.) is the determinant of the matrix inside the paranthesis, see Expression (3.7) given by Bhapkar ( [

For the value of b, we can let b = X ¯ + L s , 10 ≤ L ≤ 15 , X ¯ and are respectively the sample mean and sample standard deviation. Note that A R E ( β ^ ) despite its simplicity can give an idea whether MEEL methods are approriate for the data set and the parametric model being considered.

We shall use penalty function approaches to convert the problem of minimization with constraints to a problem of minimization without constraints by introducing a surrogate objective function which is defined suitably. The techniques of penalty function are well described in Chong and Zak ( [

For illustration, we start with a simple example and extend it to the problem for finding MEEL estimators.

Suppose that we wish to minimize a function f ( x 1 , x 2 ) with two variables x 1 and x 2 subject to a constraint c = c ( x 1 , x 2 ) = 0 .The numerical solutions of this problem can be found by minimizing the following unconstrained objective

function given by f ( x 1 , x 2 ) + K 2 ( [ c ( x 1 , x 2 ) ] 2 ) , K → ∞ . In practice setting a

value for K being very large gives solutions with numerical accuracy. The penalty function which makes use of the square function is the second component of the objective function. The minimization procedures can give exact solutions with the use of a more complicated nondifferentiable penalty function, see Chong and Zak ( [

For the MEEL minimization problem, we have λ j , j = 1 , ⋯ , k depend on β and π i * are given by

π i * ( λ , β ) = exp ( − ∑ j = 1 k λ j ( ∑ i = 1 n g j ( x i ; β ) ) ) ∑ i = 1 n exp ( − ∑ j = 1 k λ j ( ∑ i = 1 n g j ( x i ; β ) ) ) , i = 1 , ⋯ , n .

The vectors λ and β are related by the equality constraints given by

c 1 = ∑ i = 1 n π i * ( λ , β ) [ g 1 ( x i , β ) ] = 0 , ⋯ , c k = ∑ i = 1 n π i * ( λ , β ) [ g k ( x i , β ) ] = 0. (23)

Therefore, we can perform unconstrained minimization using the following objective function with respect to λ 1 , ⋯ , λ k and β 1 , ⋯ , β p ,

∑ i = 1 n π i * ( λ , β ) ln π i * ( λ , β ) + K 2 [ ( ∑ i = 1 n π i * ( λ , β ) [ g 1 ( x i , β ) ] ) 2 + ⋯ + ( ∑ i = 1 n π i * ( λ , β ) [ g k ( x i , β ) ] ) 2 ] . (24)

The penalty constant K is a large positive value, setting K = 500000 for example. If the absolute value function is used to construct the penalty function then we can only use direct search algorithms which are derivative free.

It is worth to note that only a local minimizer is found each time using these algorithms, some strategies are needed to identify the global minimizer. The following procedures can be used:

1) We might need a starting vector being close to the estimators to initialize the algorithm, this is important when working with real data. For example, we might want to consider starting the algorithm with simple but consistent estimators given by β ^ s obtained by minimizing ∑ j = 1 k ( ∑ i = 1 n g j ( x i ; β ) ) 2 .

If the number of parameters are not large, global random search can be performed. Simulated annealing (SA) or particle swarm optimization (PSO) are commonly used global random search technique, see Chong and Zak ( [

A R E ( β ^ S ) = det ( V S ( β S ^ ) ) det ( I ^ − 1 ( β S ^ ) ) ,

V S ( β S ^ ) = ( S ′ S ) − 1 S ′ Σ − 1 S ( S ′ S ) − 1 ,

evaluated at β = β S ^ .

2) For finding the global minimizer Andrews ( [

2 n K L ( π * ( λ ( 0 ) , β ( 0 ) ) , p n ) ≤ χ 0.95 2 ( k − p ) , (25)

χ 0.95 2 ( k − p ) is the 0.95 percentile of the chi-square distribution with k − p degree of freedom, k > p . We might want to minimize not only with the equality constraints given by Expression (23) but also with the inequality constraint given by Expression (25)

2 n K L ( π * ( λ , β ) , p n ) ≤ χ 0.95 2 ( k − p ) .

With penalty function methods, we can define a penalty function to handle the inequality constraint as

H 2 ( c + ) 2 , c + = max ( 2 K L ( π * ( λ , β ) , p n ) − χ q 2 ( k − p ) n , 0 ) ,

H is again a penalty constant.

This leads to find the global minimizer of a new objective function given by

∑ i = 1 n π i * ( λ , β ) ln π i * ( λ , β ) + K 2 [ ( ∑ i = 1 n π i * ( λ , β ) [ g 1 ( x i , β ) ] ) 2 + ⋯ + ( ∑ i = 1 n π i * ( λ , β ) [ g k ( x i , β ) ] ) 2 ] + H 2 ( c + ) 2 . (26)

We might also want to repeat the procedures with different starting vectors and identify the global minimizer as the value of the vector which yields the overall smallest value of

∑ i = 1 n π i * ( λ , β ) ln π i * ( λ , β ) . (27)

The representation of a new distribution created by performing operation on LT of the original distribution often suggests how to simulate from the new distribution if we can simulate from the original distribution. For example, to simulate from the tilted density f t ( x ) obtained by applying the Esscher operation on f ( x ) , it suffices to simulate from the original density f ( x ) . Since we have

f t ( x ) = e − θ x f ( x ) κ ( θ ) , κ ( s ) is the LT of the density f ( x ) , we have the following inequality f t ( x ) ≤ c f ( x ) , c = 1 κ ( θ ) .

Therefore, if we know how to simulate an observation from the density f ( x ) , we can apply the acceptance and rejection method to obtain simulated observations from f t ( x ) . This is known as the acceptance and rejection method, see

Robert and Casella ( [

acceptance probability which is useful for planning the sample size which is obtainable from the simulations. Note that this probability decreases as θ increase making it difficult to obtain a large sample from f t ( x ) for large values of θ .

The acceptance and rejection method allows a simple way to simulate observations from a positive tempered stable (PTS) as it is easy to simulate from the positive stable distribution, see Devroye ( [

In this section, we illustrate the implementation of the inferences techniques by considering the MEEL estimators versus the moment estimators for the PTS family using simulated samples. The PTS distribution was introduced by Hougaard [

L β ( s ) = exp [ − δ α ( θ + s ) α − θ α ] , δ , θ > 0 , 0 < α < 1 , β = ( δ , α , θ ) ′ .

Hougaard ( [

c j = ∑ i = 1 n ( X i − X ¯ ) j n , j = 2 , 3.

Define R = c 2 2 c 3 c 1 , if c 2 > 0 and define R = 0 , if c 2 = 0 . The moment esti-

mators obtained by matching cumulants for the parameters α , θ and δ are given respectively by

α ˜ = 2 − 1 1 − R , θ ˜ = ( 1 − α ) c 1 c 2 , δ ˜ = c 1 ( θ ˜ ) 1 − α ˜ .

We compare the performance of the moment estimators with the MEEL estimators using the base

B = { x − E ( x ) , x 2 − E ( x 2 ) , e − τ x − L β ( τ ) , ⋯ , e − m τ x − L β ( m τ ) } ,

m = 5 , δ = 0.01.

We can only have access to laptop computer so the study is limited. The sample size used is approximately with n = 5000 and we draw M = 100 samples in our simulation. The focus is on the following ranges for the parameters, we fix θ = 1 , 0.2 < α < 0.8 , 1 ≤ δ ≤ 10 . Overall the MEEL estimators are much more efficient than the moment estimators for the range of the parameters considered. The moment estimators do not seem to perform well either for all the parameters values selected outside the range. The overall relative efficiency is defined as

A R E = M S E ( δ ^ ) + M S E ( α ^ ) + M S E ( θ ^ ) M S E ( δ ˜ ) + M S E ( α ˜ ) + M S E ( θ ˜ ) .

The mean square errors (MSE) are estimated using simulated samples. The mean square error of an estimator π ^ for π 0 is defined as

M S E ( π ^ ) = E ( π ^ − π 0 ) 2 .

The simulation study is not extensive and more should be done but it does suggest the potential of the MEEL methods.

Some results are summarized using

Based on the theory the MEEL estimators cannot be as efficient as the ML estimators over the entire parameter space since only finite number of elements in the base is used. Howewer, the theory suggest that the methods might still have high efficiencies on subspaces where parameters are subject to inequality bounds. The estimate of overall relative efficiency given by Expression (22) might give some ideas whether the methods are recommended. The following considerations might be useful to assess whether the use of MEEL methods are appropriate for a parametric model and data sets which come from a specific field of applications:

α = 0.4 | |||||||
---|---|---|---|---|---|---|---|

0.01 | 0.02 | 0.03 | 0.04 | 0.06 | 0.08 | 0.10 | |

1 | 0.000 | 0.006 | 0.006 | 0.014 | 0.030 | 0.038 | 0.033 |

2 | 0.002 | 0.001 | 0.001 | 0.002 | 0.027 | 0.050 | 0.038 |

3 | 0.002 | 0.040 | 0.038 | 0.069 | 0.069 | 0.103 | 0.048 |

4 | 0.001 | 0.067 | 0.029 | 0.027 | 0.091 | 0.128 | 0.113 |

6 | 0.014 | 0.072 | 0.117 | 0.112 | 0.084 | 0.160 | 0.119 |

8 | 0.016 | 0.004 | 0.000 | 0.001 | 0.005 | 0.001 | 0.004 |

10 | 0.000 | 0.000 | 0.004 | 0.003 | 0.000 | 0.002 | 0.0018 |

α = 0.6 | |||||||
---|---|---|---|---|---|---|---|

0.01 | 0.02 | 0.03 | 0.04 | 0.06 | 0.08 | 0.10 | |

1 | 0.008 | 0.017 | 0.030 | 0.028 | 0.087 | 0.145 | 0.195 |

2 | 0.009 | 0.009 | 0.018 | 0.011 | 0.040 | 0.056 | 0.081 |

3 | 0.010 | 0.024 | 0.041 | 0.040 | 0.045 | 0.073 | 0.087 |

4 | 0.028 | 0.094 | 0.139 | 0.053 | 0.118 | 0.176 | 0.187 |

6 | 0.068 | 0.112 | 0.157 | 0.182 | 0.186 | 0.202 | 0.399 |

8 | 0.065 | 0.116 | 0.111 | 0.164 | 0.192 | 0.094 | 0.075 |

10 | 0.007 | 0.006 | 0.004 | 0.006 | 0.005 | 0.005 | 0.003 |

ARE ( MEEL v s MM ) = MSE ( θ ^ ) + MSE ( δ ^ ) + MSE ( α ^ ) MSE ( θ ˜ ) + MSE ( δ ˜ ) + MSE ( α ˜ ) . Legend: Tabulated values are estimates of ARE (MEEL vs MM) based on simulated samples from the chosen parameters δ, θ with α = 0.6.

1) Define a restricted space based on the fields of applications, also obtain β S ^ and use the estimate for overall relative efficiency to evaluate the loss of efficiency of MEEL methods in a neighborhood of β S ^ which in general should be nested inside the restricted space of interest.

2) For efficiencies of MEEL methods, we shall try to include as many elements in a finite base as possible subject to numerical limitations and try different value for τ which control the spacing of the functions in the basis to see whether there is any improvement on efficiency in a neighborhood of β S ^ .

Pricing of insurance contracts is one of the main objectives in actuarial sciences. A contract defines a random loss function g ( x ) , X is the individual loss random variable for one unit of time often assumed to be nonnegative and follow a parametric model with distribution function F β ( x ) and LT L β ( s ) . The pure premium is the following expectation under the true vector β 0 , i.e.,

P = P ( β 0 ) = E β 0 ( g ( x ) ) .

P must be estimated using data and therefore, β 0 needs to be estimated first then subsequently analytical methods or simulation methods can be used to approximate the premium. If MEEL methods are used, the parametric families with closed form LT can be validated by means of goodness-of-fit tests.

For insurance, the stop loss premium is defined as P = E β 0 { ( X − d ) + } , ( X − d ) + = max ( X − d , 0 ) . The stop loss premium can be expressed using means of distribution functions instead of expectations, see Expression (8) given by Luong ( [

If sampling from the distribution is possible then the pricing of the contracts can also be approximated using simulations based on an estimate of β 0 , it involves drawing sample based on the estimated parameters. For example, it is not difficult to simulate from a compound Poisson distribution despite its complicated density function which can only be expressed in series. Clearly, once the parameters for the compound Poisson distribution are estimated pricing of insurance contracts can be done via simulations.

We conclude here that MEEL methods appear to be useful for inferences and have been considered to be active fields of research for the last twenty years in econometrics yet they do not seem to receive much attention in actuarial sciences. When the methods are oriented toward actuarial applications and since LT is widely used in actuarial sciences, it is natural to consider extracting moment conditions from LT. It is shown that MEEL estimation is equivalent to QL estimation based on the best quasi score functions obtained by projecting the true score functions on the linear space spanned by a basis specified by the moment conditions. Based on these considerations, two families of bases are proposed in this paper to generate MEEL methods with the objective to achieve high efficiencies for actuarial applications. In general the MEEL methods using these bases are more efficient than QL methods based on quadratic estimating functions and methods of moments. With finite bases, in general the MEEL methods can attain near full efficiency on restricted parameter spaces only. MEEL methods can still be very attractive if depending on the fields of applications; we essentially work with these restricted spaces and it is important to measure the loss of efficiency to verify the appropriateness of the methods for the field of applications. The methods can easily be adapted for estimation of continuous distributions with support on the real line encountered in finance by using constraints extracted from model moment generating function instead of LT.

The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.

Luong, A. (2017) Maximum Entropy Empirical Likelihood Methods Based on Laplace Transforms for Nonnegative Continuous Distribution with Actuarial Applications. Open Journal of Statistics, 7, 459-482. https://doi.org/10.4236/ojs.2017.73033