Maximum Entropy Empirical Likelihood Methods Based on Laplace Transforms for Nonnegative Continuous Distribution with Actuarial Applications ()
1. Introduction
1.1. New Distributions Created Using Laplace Transforms
Nonnegative continuous parametric families of distributions are useful for modeling loss data or lifetime data in actuarial sciences. Many of these families do not have closed form densities. The densities can only be expressed using means of infinite series representations but their corresponding Laplace transforms (LT) have closed form expressions and they are relatively simple to handle. An illustration is given below.
Example
Hougaard [1] introduced the positive tempered stable (PTS); the PTS distribution is obtained by tilting the positive stable (PS) distribution. The random variable
follows a positive stable law if the Laplace transform is given as
The density function
has no closed form but can be represented as an infinite series.
Now if we create a new distribution using the Esscher transform technique, the corresponding new density can be expressed using
and is given by
,
and its LT is
.
This operation adds an extra tilted parameter
to the vector of parameter
of the original distribution and a new distribution is created. This new distribution is the positive tempered stable (PTS) distribution with Laplace transform given by
(1)
The first four cumulants are given by Hougaard [1] with
For the limiting case
we have the gamma distribution. In general,
the density function has no closed form except for
. For
we ob-
tain the inverse Gaussian (IG) distribution with density function given by Hougaard ( [1] , p.392) as
For other parameterisation for the IG distribution, see Panjer and Wilmott ( [2] , p.114).
Hougaard [1] has given the name power variance for the PTS distribution and developed moment estimators. There are many names given to this distribution. In the financial literature the name PTS is commonly used, see Schoutens ( [3] , p.56), Kuchler and Tappe [4] . In actuarial sciences, it is also called generalized gamma distribution, see Gerber [5] .
Many new infinitely distributions (ID) can be created using operations on LT based on existing distributions. One of them is the power mixture (PM) operator, see Abate and Whitt ( [6] , p.92). It can be summarized as follows. Assume that
is an infinitely divisible random variable with LT given by
The LT of
is formed using
which is the LT of a continuous nonnegative, ID random variable
. With the introduction of a new random variable
which is also positive continuous and ID with distribution
, a new nonnegative continuous distribution with LT
can then be created with LT
(2)
The new distribution is created using the power mixture (PM) operator. The PM operator was introduced by Abate and Whitt ( [6] , p.92). The random variable Y is also called mixing random variable.
The new distribution obtained will have more parameters than the distribution of
. For other methods, such as compounding methods for creating new distributions, see Klugman et al. ( [7] ), p.141-1430. For other ID nonnegative distributions with closed form LT’s, see Section (1.2) of Luong [8] . ID nonnegative distributions also appear in risk theory as they arise naturally from Lévy processes often used to model aggregate claim processes, see Gerber [9] , Dufresne and Gerber [10] for examples. We often work with ID distribution and for completeness, a definition is given below.
Definition (Lévy-Khintchine Representation). A characteristic function (CF)
of a random variable
is infinitely divisible if and only if it can be represented as
is a bounded and non-decreasing function with
see Rao [11] . An equivalent expression known as the canonical Lévy-Khintchine representation is also used in the literature, see Sato [12] . A similar representation using LT for nonnegative distribution instead of CF can be found in Feller ( [13] , p.450).
1.2. Quasi-Likelihood Estimation
For statistical inferences, we assume that we have a random sample with n observations,
. These observations are independent and identically distributed as
which has a model distribution with closed form LT
is the vector of parameters. The true parameter vector is denoted by
. The density function
has no closed form which makes likelihood estimation difficult to implement. Consequently, we would like to estimate
based on
. Quasi-likelihood among other methods which do not rely on the true density can be considered. A brief review of QL estimation is given below.
Godambe and Thompson [14] developed estimating equations theory and extended quasi-likelihood theory in their seminal paper. They also proposed estimators using quadratic estimating equations which can be viewed as based on a quasi-score functions obtained by projecting the vector of the scores functions denoted by
on the linear spanned space by the basis
Note that
can be obtained based on the model Laplace transform denoted by
. See Chan and Ghosh [15] for a geometry point of view of estimating functions. Godambe and Thompson [14] obtained the most efficient estimators based on quasi-score functions obtained by projecting the true score functions on the linear space spanned by
. They are the most efficient quadratic estimating function (EQEF) estimators within the class of quadratic estimating function introduced by Crowder [16] . The class includes Gaussian quasi-likelihood estimating equations. Consequently, The EQEF estimators are more efficient than normal quasi-likelihood (NQL) estimators in general.
The EQEF estimators are simple to obtain since the basis
has only two elements. The fact they are based on best approximations of the score functions allow them to outperform moment estimators in many circumstances.
For example, we can consider a parametric model with 3 parameters which leads to solve the moment equations, i.e.,
It is easy to see that the quasi-score functions of moment methods belong to the linear space spanned by the basis
.
Even that
includes all the elements of
, the EQEF methods can outperform the method of moments due to the quasi-score functions of the methods of moments are not the best approximations based on
.
Therefore, in this paper we shall emphasize quasi-score functions which make use of best linear combinations of elements of a basis to produce quasi score functions and propose some bases that can provide better efficiencies than the basis formed by linear and quadratic polynomials. The basis should only make use of the model LT. Note that moment estimators based on selected points of the model LT have been discussed in the literature, see Read ( [17] , p.151-153). The methods appear to be useful for fields of applications which make use extensively of LT of the distributions such as actuarial sciences or engineering.
We shall approach in a unified way so that both QL methods and the MEEL methods are related to the notion of projection of the true score functions on a linear space spanned by a finite base. Within this framework, MEEL estimators are shown to be asymptotically first order equivalent to QL estimators using the same base. For the first and higher order properties of empirical likelihood estimators, see Newey and Smith [18] , Smith [19] .
MEEL methods use of informations from the parametric model via constraints and there is one to one correspondence between the constraints, moment conditions and the elements of the basis. Despite that general theory of MEEL or QL methods is well established, the question which bases should we choose to achieve good efficiencies appears to be a relevant one for applications. There is a need to quantify the loss of efficiency as well and consequently in this paper, we also propose a measure of loss of efficiency to evaluate whether MEEL are appropriate methods for analyzing a data set from a specific field of applications.
We hope that the answers will give ideas on how to choose moment conditions or constraints for MEEL estimators. It will also give ideas how to construct semi-parametric bounds as defined by Chamberlain ( [20] , p.311) which can approximate the parametric bound which is the inverse of the Fisher information matrix. We emphasize MEEL methods but offer a unified view for both MEEL and QL methods as they are related. Numerical implementations of the MEEL methods are also discussed to facilitate practical implementations of these methods for applications in actuarial sciences. We shall discuss the quasi-score functions of the QD methods in the next section.
1.3. Quadratic Distance (QD) Estimation
Let
be the score functions and note that
in general. If we try to approximate
using a quasi-score function formed by linear combinations of the functions
, this leads to consider quasi-score functions of the form
We shall also impose the condition of unbiasedness of estimating function by requiring
. With these restrictions, it is equivalent to consider
Using vector notations,
and define
For the best approximation by projecting
on the linear space spanned by the basis
, we look for the vector of coefficients
which minimizes
,
, the expectation is taken under
.
Using results of the proof of Theorem 4.1 given by Luong and Doray ( [21] , p 150), the optimum vector is
with
(3)
is the covariance matrix of
under
. We also use the notation
if an emphasis on the dependence on
is needed.It is easy to see that the best approximation is given by
.
Note that the elements of
need to be spelled out explicitly which means that the covariance matrix
needs to be known or estimated for applying quasi-likelihood estimation. MEEL estimation does not need this feature and yet produces asymptotic equivalent estimators. This is one of the main advantages of MEEL estimation over QL estimation.
Quadratic distance (QD) estimation as given by Luong and Thompson [22] can be viewed as a form of quasi-likelihood estimation. Numerically, it might be easier to implement QD methods than QL methods defined using estimating functions as there is an objective function to minimize for QD estimation rather than solving for roots of the QL estimating equations. QD estimation will be briefly discussed below.
Let
be a vector defined based on observations,
Its model counterpart is given by
where
is the sample distribution function and its model counterpart is denoted by
.
The QD estimators
are obtained by minimizing the quadratic form defined as
The equivalent quadratic form is
(4)
is a consistent estimate of
under the true vector of parameters
. One can see that this procedure is equivalent to use quasi-score functions obtained by projecting the true score functions on the linear space spanned by
since minimizing Expression (3) leads to solve for
the system of equations
,
Observe that the vector
is equivalent to
.
From results in Luong and Thompson ( [22] , p 245) the asymptotic distribution of
is given as
(5)
The matrix
can be expressed as
.
The elements of the matrix
are evaluated under
,
is the transpose of
.
Following Morton ( [23] , p.228), the matrix
(6)
can be defined as the information matrix of the vector of optimum quasi-score functions and it is related to the semiparametric bounds using the moment conditions as given by Chamberlain ( [20] , p.311). The moments conditions can be identified with elements of the basis
and so are the constraints used for MEEL methods.
Despite QL and MEEL methods generate asymptotic equivalent estimators, there are reasons to consider MEEL methods rather than quasi-likelihood me- thods.
With MEEL methods, we have the following main advantages:
1) The matrix
which depends on
in general needs to be specified explicitly which might restrict elements to be included in the basis. We can only include elements with relative simple form for their covariances, otherwise
will be complicated.
2) If
is replaced by a consistent estimate
under
, the estimate is often not accurate enough especially when the sample size n is not large enough and therefore,
tends to be nearly singular even with a few elements in the basis, this creates numerical instability when applying QD methods or quasi- likelihood methods.
3) Goodness of fit test statistics with limiting chi-square distributions for testing the model can be constructed in a unified way with MEEL methods. This feature is not shared by QL methods.
Within the class of empirical likelihood methods, the MEEL methods are numerically more stable than the original empirical likelihood methods (EL) which were first introduced by Owen [24] . For asymptotic properties of the empirical likelihood methods, see Qin and Lawless [25] , Schennach [26] , Imbens et al. [27] . Also, see the monograph by Owen [28] , the book by Mittelhammer et al. ( [29] , p.281-325) and the book by Anatolyev and Gospodinov ( [30] , p.45-61). It is also worthwhile to note that the MEEL methods are less simulation oriented than indirect inference methods as proposed by Garcia et al. [31] . Numerical implementations using penalty function methods are relative simple and will be discussed in Section 4. We hope that with the exposition of the methods in details without too many technicalities, it will encourage people to use these methods in practice. There are many fields beside actuarial sciences where LT for the distribution is widely used. With some modifications, such as using constraints from the model moment generating function instead of the model LT, the methods can be applied to estimate distribution with support on the real line which are often used in finance, see these distributions in Fang and Osterlee [42] .
The paper is organized as follows. The choice of bases for generating constraints for the MEEL methods is examined in Section 2. Two families of bases using LT are presented in this section. These two families of bases appear to be useful for actuarial applications. In Section 3, we review asymptotic properties of MEEL methods. An estimate for overall relative efficiency using Fourier cosine series expansion is proposed to quantify the loss of overall efficiency when MEEL methods are used. In Section 4 we examine numerical issues and penalty function methods are advocated to locate the global minimizer which gives the MEEL estimators. Simulations are discussed in Section 5. The simulation study from the positive tempered stable distribution shows that the MEEL estimators are much more efficient than moment estimators originally proposed by the seminal paper of Hougaard [1] . Based on the fields of application, often the full parameter space is not needed and can be restricted to a subspace by having the parameters subject to inequality bounds, the MEEL estimators have the potential to attain high efficiency when comparing to the maximum likelihood (ML) by using a reasonable number of elements in the basis. Actuarial applications are discussed in Section 6.
2. Choice of Bases
Using results in Section 1.2, we consider the basis B which can be used for nonnegative continuous distribution or nonnegative distribution with a discontinuity point at the origin with mass assigned to it. The basis B will have the form
(7)
We observe that the number of elements in the basis is
and the elements can be obtained using the LT of the model and therefore suitable for estimation for parametric continuous distribution with density without a closed form expression.
The number of elements in basis B is finite. It is formed based on the completeness property of the following basis with an infinite number of elements,
(8)
This infinite basis can be traced back to the work of Zakian and Littlewood [32] who show that a density function can be expressed as an infinite series using elements of the infinite basis given by Expression (8) and develop methods to recover the density function using selected points of its LT. This might explain the potential of high efficiencies of the MEEL estimators constructed using only a finite number of elements of B on some restricted parameter spaces.
The following example will make it clear the notion of restricted parameter spaces. For example, we have a model with two parameters given by
,
. On a restricted parameter space, the parameters are subject to stricter in equalty bounds. For example
and
with
are finite positive real numbers.
Therefore, in practice we might want to fix
and
, i.e., let
(9)
The basis B as indicated above often gives a good balance between numerical simplicity and efficiencies of the estimators.
If the model density has no discontinuity at all then the following basis with negative power moment elements can be considered and we shall see negative power moments can be recovered using the LT. Using the result given in lemma 1 given by Brockwell and Brown ( [33] , p.630), the following infinite basis with negative power moment element is again complete in general if
belongs to some interval with
Therefore, the following finite basis
(10)
can also be considered.
The elements of a basis should respect the regularity conditions of Assumption of Section (3.2) for the estimators to be consistent and have an asymptotic normal distribution. The following example will illustrate this point. In practice for example if
exists and lower negative power moments do not exist, we might want to choose C to be
(11)
The last element is special as it involves h which can be set equal to some small positive value, for example let
for the regularity condition 3) of Assumption 1 to be met. Obviously, if
exists then we can let
.
Now we shall state a proposition which relates negative power moments of a distribution to its LT. The results given by the following proposition are more general than results given by Cressie et al. [34] who only give results for negative integer moments. The general results can be traced back to Theorem 2.1 given by Brockwell and Brown ( [35] , p.215) but it can be difficult to find this reference, so we reproduce the results below.
Proposition
Suppose that
is a nonnegative continuous random variable with density function and Laplace transform given respectively by
and
then if
exists, it is given by
,
is the
commonly used gamma function, assuming the integral exists.
Proof.
Observe that
by switching the or-
der of integration and note that the inner integral can be expressed as
,
using properties of a gamma distribution. The integral
if it exists can be evaluated numerically. Most of computer packages provide built- in functions to evaluate these integrals numerically.For the positive stable distribution or gamma distribution negative power moments have closed form expressions, see Luong and Doray ( [21] , p.149). The bases B and C only provide guidelines to form a good basis based on LT. We can also combine or select elements from these bases to form new bases.
3. MEEL Methods
3.1. Two Stages Distance Methods
MEEL methods as discussed in chapter 13 by Mittelhammer et al. ( [29] , p.313- 326) belong to the class of empirical likelihood methods. The name MEEL is given by the authors and MEEL methods are based on the Kullback-Leibler distance which belongs to the class of distance for discrete distribution introduced by Cressie-Read [36] . MEEL methods are also called exponential tilted empirical likelihood methods in the literature, see Imbens [37] . The methods are asymptotically as efficient as other EL methods. The main reason which leads us to emphasize MEEL methods over EL methods is the advantage of the numerical stability of MEEL methods, see Schennach [26] for this property. We shall discuss how to implement MEEL methods with the specifications of constraints extracted from LT of the original model. These constraints are associated with moment conditions or elements of a finite base. MEEL methods can also be viewed conceptually as a two stages distance methods based on the Kullback- Leibler distance for discrete distributions. The first stage consists of choosing the best proxy discrete model to replace the original parametric model and the second stage consists of using the best discrete proxy model to estimate the parameters of the original model.
Assume that we have a random sample as in Section 1.2. The vector
has p components, i.e.,
We are in the situation where the density
has no closed form expression but using the LT, we can extract k moments of the original parametric model,
assuming
.
Clearly, the sample distribution function corresponds to a discrete distribu-
tion which assigns the mass
at the realized point of the observation
and
. Now instead of using the original model for inferences we shall consider proxy discrete models with mass function
assigning mass at the realized point of observations. Let
The Kullback-Leibler distance between the two discrete distributions
and
is defined by the following measure of discrepancy,
We also require the proxy model beside satisfying the basic requirement, i.e.,
, it also satisfies the same moment conditions of the original parametric model, i.e.,
Parametric estimation will be carried out in two stages. The first stage is to choose the best proxy model by minimizing
which is equivalent to maximize the entropy measure with the above constraints. It leads to maximize
or equivalently minimize
(12)
subject to the constraints given by
(13)
(14)
with
(15)
Mittelhammer et al. ( [29] , p.321) have shown that the Lagrangian of the optimization problem is
with
and
are Lagrange multipliers. Taking partial derivatives with respect to
leads to the system of equation
(16)
The solutions of the equation yield the best discrete proxy model with mass function given by
(17)
which is Expression (13. 2.6) given by Mittelhammer et al. ( [29] , p.321). Note that
and the
are defined implicitly by Expression (16).
Note that since the
are defined implicitly, they depend on
but do not depend on the Lagrange multiplier
as it is easy to see that we already have
and
Let
,
The second stage is to use the KL distance for parametric inferences. At this stage, we minimize with respect to
the expression
(18)
to obtain the MEEL estimators
.
The numerical procedures to implement MEEL methods appear to be complicated as the
are defined implicitly. Numerical procedures are simplified by using penalty function methods and will be discussed in Section 4. With this approach, it suffices to perform unconstrained minimization with respect to k + p variables given by
with
using a suitably defined objective function. Therelationships between the vector
and the vector
are given by
(19)
will be used to build the penalty function part of the new objective function.
Imbens ( [37] , p.501-502) also advocated the use of a specific version of penalty function approach to obtain the MEEL estimators. Chong and Zak ( [38] , p.564-571) give details on how to construct penalty function to handle optimization under equality and inequality constraints and can be a good reference for using penalty methods. We choose to follow more closely penalty methods of nonlinear optimization used in the literature as given by Chong and Zak ( [38] ) and suggest strategy to identify the global minimizer vector which is the vector of the estimators.
The identification of the global minimizer in nonlinear estimation which gives the estimates is an important one as most of the algorithms only give local minimizers and therefore are vulnerable to starting points used to initialize the algorithm, see Davidson and McKinnon ( [39] , p.232-233) for a strategy using different starting points. Andrews [40] proposes to use the criterion functions of goodness of fit test statistics to limit the search for the global minimizer in a suitable restricted parameter space, this can be handled easily with penalty function methods with inequality constraints. For performing a global random search based on simulated annealing, see Robert and Casella ( [41] , p.140-146).
3.2. Asymptotic Properties
3.2.1. Asymptotic Covariance
The regularity conditions for the MEEL estimators
to be consistent and to follow an asymptotic normal distribution have been given by Assumption 1 in Schennach ( [26] , p.645) who also provides proofs for consistency and asymptotic normality of the MEEL estimators. The regularity conditions are reproduced below. Also, see Expressions (13.2.10), (13.2.11) of the book by Mittelhammer et al. ( [29] , p.323).
Assumption
Assume that:
1) The true parameter given by the vector
is an interior point of the parametric space
which is assumed to be compact.
2)
is the unique vector which satisfies
.
3)
is differentiable with respect to
and
for some
.
4) The derivatives of
,
also satisfy the local bounde- ness condition
for some
when
is restricted to some neighbor- hood of
.
5) The covariance matrix
of
has rank
.
Under Assumption 1, then the MEEL estimators given by the vector
is consistent and have a multinormal asymptotic distribution,
,
is the vector of the true parameters,
(20)
An estimator
for
can be defined,
If we let
,we have another consistent estimator for
.
Note that
is identical to Expression (5) which shows the asymptotic equivalence between optimum quasi-likelihood estimation and MEEL estimation. Both methods do not need full specifications of the model but only require moment conditions of the true model.
3.2.2. Goodness-of-Fit Test Statistics
The use of the KL distance also allows construction of a goodness-of-fit test statistics which follows an asymptotic chi-square distribution. The validity of the original model is reduced to the validity of moment conditions, we might want to test the null hypothesis specified as
, the expectations are under the true parametric model.
The following test statistics given below is a chi-square test statistics with
degree of freedom, i.e.,
(21)
3.3. An Estimate for the Overall Relative Efficiency
It is clear that only under special circumstances that MEEL methods are as efficient as ML methods due to the use of a finite basis. This can only happen when the true score functions belong to the linear space spanned by a finite basis. Therefore, it appears to be useful to be able to quantify the loss of efficiency when using MEEL methods despite the model density has no closed form expression to check whether MEEL methods are appropriate for a specific field of applications. Fourier series expansion can be useful to approximate the density function and will be introduced below.
The density function can be expanded using Fourier cosine series in the range
, see Expressions (7-11) given by Fang and Osterlee ( [42] , p.6), Powers ( [43] , p.62), i.e.,
The coefficients
are Fourier coefficients,
Regularity conditions for uniform convergence of Fourier series are also given by Powers ( [43] , p.72-73). The derivatives of these coefficients with respect to
are given by
If
is chosen sufficiently large, we have the following approximations of the coefficients using either the characteristic function (CF) or LT,
and
. Similarly,
is the real part of the complex number inside the parenthesis and most of the computer packages can handle complex numbers computations. In practice, we can only use a finite cosine series expansion with M terms. The formulas for the coefficients given by Fang and Osterlee ( [42] , p.6) make use of the characteristic function but they can be converted easily to expressions using LT. Using these truncated series, it leads to approximate the score functions by
with
,
Therefore, if
is estimated by
, the Fisher information matrix
can be estimated by
using the original sample or simulated samples from the distribution with
. If the original sample is used,
The estimate overall relative efficiency can be defined based on Expression (20) as
(22)
, det(.) is the determinant of the matrix inside the paranthesis, see Expression (3.7) given by Bhapkar ( [44] , p.471) for overall relative efficiency using determinants of matrices. Instead of determinants of matrices, the traces of matrices can also be used, this leads to alternative measure of overall relative efficiency. Fang and Osterterlee ( [42] ) show that finite cosine Fourier series converge at an exponential rate which suggest that with M ≥ 500, the approximation should be quite accurate if the model density is continuous using examples given by their paper. The value M can be increased for more accuracy if needed.
For the value of b, we can let
and
are respectively the sample mean and sample standard deviation. Note that
despite its simplicity can give an idea whether MEEL methods are approriate for the data set and the parametric model being considered.
4. Numerical Implementations
We shall use penalty function approaches to convert the problem of minimization with constraints to a problem of minimization without constraints by introducing a surrogate objective function which is defined suitably. The techniques of penalty function are well described in Chong and Zak ( [38] , p.560-567). They can handle both equalities and inequalities constraints. The new objective function can be minimized using a precise direct search based on Nelder-Mead simplex methods for example. The simplex methods are derivative free and converge to local optimizers, see Chong and Zak ( [38] , p.274-278). The package R has built-in functions to perform simplex algorithm with constraints.
For illustration, we start with a simple example and extend it to the problem for finding MEEL estimators.
Suppose that we wish to minimize a function
with two variables
and
subject to a constraint
.The numerical solutions of this problem can be found by minimizing the following unconstrained objective
function given by
In practice setting a
value for
being very large gives solutions with numerical accuracy. The penalty function which makes use of the square function is the second component of the objective function. The minimization procedures can give exact solutions with the use of a more complicated nondifferentiable penalty function, see Chong and Zak ( [38] , p.570-571).
For the MEEL minimization problem, we have
depend on
and
are given by
The vectors
and
are related by the equality constraints given by
(23)
Therefore, we can perform unconstrained minimization using the following objective function with respect to
and
,
(24)
The penalty constant K is a large positive value, setting K = 500000 for example. If the absolute value function is used to construct the penalty function then we can only use direct search algorithms which are derivative free.
It is worth to note that only a local minimizer is found each time using these algorithms, some strategies are needed to identify the global minimizer. The following procedures can be used:
1) We might need a starting vector being close to the estimators to initialize the algorithm, this is important when working with real data. For example, we might want to consider starting the algorithm with simple but consistent estimators given by
obtained by minimizing
.
If the number of parameters are not large, global random search can be performed. Simulated annealing (SA) or particle swarm optimization (PSO) are commonly used global random search technique, see Chong and Zak ( [38] , p. 279-285) to supplement local search algorithm. This problem is less severe for local search algorithm using simulated data since the true vector
is known. The simple estimators can be considered as quasi-likelihood estimators which can make use of a larger basis than the one used to generate MEEL estimators since there are less numerical difficulties to compute the simple estimators, there is no need to estimate
. However, these quasi score functions are no longer orthogonal projections on the larger basis used. Based on remark 2.4.3 given by Luong and Thompson ( [22] , p.245) which gives the asymptotic covariance matrix of
, the overall relative efficiency can be defined as
,
evaluated at
.
2) For finding the global minimizer Andrews ( [40] , p.919-921) has suggested the use of the criterion function of a goodness of fit test statistics to identify good starting vectors by requesting a good staring vector
must satisfy the inequality
(25)
is the 0.95 percentile of the chi-square distribution with
degree of freedom,
. We might want to minimize not only with the equality constraints given by Expression (23) but also with the inequality constraint given by Expression (25)
With penalty function methods, we can define a penalty function to handle the inequality constraint as
,
,
H is again a penalty constant.
This leads to find the global minimizer of a new objective function given by
(26)
We might also want to repeat the procedures with different starting vectors and identify the global minimizer as the value of the vector which yields the overall smallest value of
(27)
5. Simulations
5.1. Simulations from the PTS Distribution
The representation of a new distribution created by performing operation on LT of the original distribution often suggests how to simulate from the new distribution if we can simulate from the original distribution. For example, to simulate from the tilted density
obtained by applying the Esscher operation on
, it suffices to simulate from the original density
. Since we have
,
is the LT of the density
, we have the following inequality
Therefore, if we know how to simulate an observation from the density
, we can apply the acceptance and rejection method to obtain simulated observations from
. This is known as the acceptance and rejection method, see
Robert and Casella ( [41] , p.51-57) for example. The constant
is the
acceptance probability which is useful for planning the sample size which is obtainable from the simulations. Note that this probability decreases as
increase making it difficult to obtain a large sample from
for large values of
.
The acceptance and rejection method allows a simple way to simulate observations from a positive tempered stable (PTS) as it is easy to simulate from the positive stable distribution, see Devroye ( [45] , p.350). Consequently, with a simple algorithm to simulate from the PTS distribution, it allows us to test the performance of the MEEL estimators versus the moment estimators originally proposed by Hougaard ( [1] , p.392) for the PTS distribution. The moment estimators were proposed since it is difficult to obtain the density function for the PTS which prevents the use of likelihood estimation.
5.2. A Limited Simulation Study
In this section, we illustrate the implementation of the inferences techniques by considering the MEEL estimators versus the moment estimators for the PTS family using simulated samples. The PTS distribution was introduced by Hougaard [1] with Laplace transform given by Expression (6) as
Hougaard ( [1] , p.392) suggested the following moment methods to estimate the parameters. Let
be respectively the first, second and third empirical cumulants, i.e.,
is the sample mean and
Define
, if
and define
, if
. The moment esti-
mators obtained by matching cumulants for the parameters
and
are given respectively by
We compare the performance of the moment estimators with the MEEL estimators using the base
,
We can only have access to laptop computer so the study is limited. The sample size used is approximately with n = 5000 and we draw M = 100 samples in our simulation. The focus is on the following ranges for the parameters, we fix
,
,
. Overall the MEEL estimators are much more efficient than the moment estimators for the range of the parameters considered. The moment estimators do not seem to perform well either for all the parameters values selected outside the range. The overall relative efficiency is defined as
The mean square errors (MSE) are estimated using simulated samples. The mean square error of an estimator
for
is defined as
The simulation study is not extensive and more should be done but it does suggest the potential of the MEEL methods.
Some results are summarized using Table 1 to keep the paper within a reasonable length and they are displayed below to give an idea on the gains on using the MEEL method instead of moment methods.
Based on the theory the MEEL estimators cannot be as efficient as the ML estimators over the entire parameter space since only finite number of elements in the base is used. Howewer, the theory suggest that the methods might still have high efficiencies on subspaces where parameters are subject to inequality bounds. The estimate of overall relative efficiency given by Expression (22) might give some ideas whether the methods are recommended. The following considerations might be useful to assess whether the use of MEEL methods are appropriate for a parametric model and data sets which come from a specific field of applications:
Table 1. Asymptotic relative efficiencies comparisons between MEEL estimators and moment (MM) estimators.
Legend: Tabulated values are estimates of ARE (MEEL vs MM) based on simulated samples from the chosen parameters δ, θ with α = 0.6.
1) Define a restricted space based on the fields of applications, also obtain
and use the estimate for overall relative efficiency to evaluate the loss of efficiency of MEEL methods in a neighborhood of
which in general should be nested inside the restricted space of interest.
2) For efficiencies of MEEL methods, we shall try to include as many elements in a finite base as possible subject to numerical limitations and try different value for
which control the spacing of the functions in the basis to see whether there is any improvement on efficiency in a neighborhood of
.
6. Actuarial Applications
Pricing of insurance contracts is one of the main objectives in actuarial sciences. A contract defines a random loss function
,
is the individual loss random variable for one unit of time often assumed to be nonnegative and follow a parametric model with distribution function
and LT
. The pure premium is the following expectation under the true vector
, i.e.,
P must be estimated using data and therefore,
needs to be estimated first then subsequently analytical methods or simulation methods can be used to approximate the premium. If MEEL methods are used, the parametric families with closed form LT can be validated by means of goodness-of-fit tests.
For insurance, the stop loss premium is defined as
,
. The stop loss premium can be expressed using means of distribution functions instead of expectations, see Expression (8) given by Luong ( [8] , p.543) for analytical methods to evaluate the stop loss premium.
If sampling from the distribution is possible then the pricing of the contracts can also be approximated using simulations based on an estimate of
, it involves drawing sample based on the estimated parameters. For example, it is not difficult to simulate from a compound Poisson distribution despite its complicated density function which can only be expressed in series. Clearly, once the parameters for the compound Poisson distribution are estimated pricing of insurance contracts can be done via simulations.
7. Conclusion
We conclude here that MEEL methods appear to be useful for inferences and have been considered to be active fields of research for the last twenty years in econometrics yet they do not seem to receive much attention in actuarial sciences. When the methods are oriented toward actuarial applications and since LT is widely used in actuarial sciences, it is natural to consider extracting moment conditions from LT. It is shown that MEEL estimation is equivalent to QL estimation based on the best quasi score functions obtained by projecting the true score functions on the linear space spanned by a basis specified by the moment conditions. Based on these considerations, two families of bases are proposed in this paper to generate MEEL methods with the objective to achieve high efficiencies for actuarial applications. In general the MEEL methods using these bases are more efficient than QL methods based on quadratic estimating functions and methods of moments. With finite bases, in general the MEEL methods can attain near full efficiency on restricted parameter spaces only. MEEL methods can still be very attractive if depending on the fields of applications; we essentially work with these restricted spaces and it is important to measure the loss of efficiency to verify the appropriateness of the methods for the field of applications. The methods can easily be adapted for estimation of continuous distributions with support on the real line encountered in finance by using constraints extracted from model moment generating function instead of LT.
Acknowledgements
The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.