1. Introduction
Stable distributions are a class of distributions with excellent properties. They are the only distributions with an absorption domain, meaning that the sum of any random variables asymptotically converges to a stable distribution. As a result, stable distributions possess desirable theoretical properties. Additionally, stable distributions are particularly effective in fitting data with heavy-tail, making them widely applicable in fields such as financial markets and signal processing.
Stable distributions were first introduced by Lévy in 1925. However, the development of parameter estimation methods for stable distributions has progressed slowly, mainly due to the following two reasons: 1) With a few exceptions, stable distributions do not have explicit probability density functions or distribution functions. As a result, traditional parameter estimation methods based on probability density functions are no longer applicable. 2) Stable distributions do not have finite second-order moments or higher-order moments. Therefore, traditional parameter estimation methods based on higher-order moments are also unsuitable. Currently, the main parameter estimation methods for stable distribution models include the quantile method [1], the fractional lower-order moment method [2], and the characteristic function method [3]. The quantile method requires estimating from a table based on calculated quantiles, which not only has a limited scope of application but also lacks high precision. The effectiveness of the fractional lower-order moment method depends on the choice of the order, which the user must determine. The characteristic function method estimates results based on sample size and initial parameter values by querying estimation interval tables, but it requires multiple queries to meet the convergence criteria, resulting in large computational effort and low accuracy. Therefore, there is a need to develop a simple and effective parameter estimation method for stable distribution models.
The Bayesian Synthetic Likelihood (BSL) is a likelihood-free inference method based on the Bayesian framework, first introduced by Price et al. in 2018 [4]. This method has also been applied to parameter inference in various models, such as the SDEMEM model [5], SDEs model [6], and ARCH model [7]. However, the fundamental assumption of this method is that the observed data follows a Gaussian distribution. When the observed data significantly deviates from a Gaussian distribution, this can lead to inaccurate estimates of the likelihood function, which in turn results in inaccurate estimates of the posterior distribution. Several studies have proposed improvements to the BSL method, which heavily relies on the Gaussian assumption. Fasiolo et al. [8] proposed a more flexible density estimator, called the extended empirical saddlepoint approximation, which relaxes the normality assumption. However, this method requires the user to select a shrinkage parameter, and both the accuracy of the results and computational efficiency depend on the choice of this parameter. An et al. [9] addressed this issue by using a semi-parametric estimation approach to estimate the likelihood function, leading to the new method called semiBSL. Numerical experiments show that this method significantly outperforms BSL and the extended empirical saddlepoint approximation when the observed data deviates from a Gaussian distribution. In 2022, An et al. [10] developed an R package for the MCMC-BSL, MCMC-semiBSL, and related extended methods to facilitate their subsequent use. Picchini et al. [11] improved the convergence speed of BSL by introducing a guided adaptive sampling method and demonstrated through numerical experiments that when the initial parameter values are far from the true values, the MCMC sampling method converges slowly and can even get trapped in local optima. Since semiBSL also employs MCMC sampling, it theoretically faces the same issue.
Therefore, this paper proposes improvements to the semiBSL method. On the one hand, the Adaptive Monte Carlo (AM) algorithm is introduced to enhance the convergence speed of semiBSL. On the other hand, transformed kernel density estimation (TKDE) is incorporated to improve the estimation accuracy of semiBSL when dealing with data exhibiting “heavy-tail” characteristics. Simulation study and empirical study based on stable distributions are conducted, and the experimental results confirm the superiority of the proposed improvements.
2. Theoretical Foundations
2.1. Stable Distributions
There are several ways of defining a stable random variable. First, the initial definition of a stable random variable is based on the sums of random variables. A random variable
is said to have a stable distribution if for any
, a positive number
and real number
exist such that:
(1)
where
are independent copies of
.
Second, stable variables can be defined via their Lévy measure. For any
, the Lévy measure of a stable process is given by:
(2)
where
is an indicator function. The calculation of the characteristic function of the stable distribution is based on the Lévy-Khintchine formula. The characteristic function of
given by:
(3)
where
(4)
,
,
, and
. The parameter
is the index of stability and it controls the behaviour of the left and right tails. When
is close to 2, the tail becomes thin.
,
, and
are the skewness, scale, and location parameters, respectively. When a random variable
follows a stable distribution, we denote it by
.
2.2. Bayesian Synthetic Likelihood
In Bayesian Synthetic Likelihood (MCMC-BSL), the objective is to simulate from the summary statistic posterior given by
(5)
where
is the parameter that requires estimation with corresponding prior distribution
. Here,
is the observed data that is subsequently reduced to a summary statistic
where
is the summary statistic function. The dimension of the statistic
must be at least the same size as the parameter dimension, i.e.
.
The BSL involves approximating
with
(6)
The mean
and covariance
are not available in closed form but can be estimated via independent model simulations at
. The procedure involves drawing
, where
for
, and calculating the summary statistic for each dataset,
, where
is the summary statistic for
,
. These simulations can be used to estimate
and
unbiasedly
(7)
We can sample from the approximate posterior using MCMC, see Algorithm 1.
Algorithm 1: MCMC-BSL algorithm |
Input: Summary statistic of the data,
, the prior distribution,
, the proposal
distribution
, the number of iterations, T, the number of simulation runs, n, the
initial value of the chain
. Output: MCMC sample
from the BSL posterior,
. Some
samples can be discarded as burn-in if required. 1: Simulate
and compute
2: Compute
and
using (7) 3: for
do 4: Draw
5: Simulate
and compute
6: Compute
and
using (7) 7: Compute
8: if
then 9: Set
10: else 11: Set
12: end |
2.3. Semi-Parametric Bayesian Synthetic Likelihood
The difference between MCMC-semiBSL and MCMC-BSL lies in the treatment of the likelihood function. MCMC-BSL directly assumes the likelihood function to be a Gaussian distribution with unknown parameters, while MCMC-semiBSL employs a Copula model to estimate the likelihood function. The specific steps are as follows: First, kernel density estimation (KDE) is used to estimate each marginal density. Then, a Gaussian Copula function is applied to estimate the joint density. Finally, the resulting joint density is taken as the estimated likelihood function.
Based on kernel density estimation, the marginal density of the j-th component of the summary statistic
can be estimated as:
(8)
where
represents the j-th component of the summary statistic
, and
denotes the j-th component of the i-th simulated data. The kernel density estimator
is given by
, where
is the bandwidth, and
is the kernel function, satisfying
and
. There are many kernel functions that satisfy these conditions, and in MCMC-semiBSL, the Gaussian kernel function
is used. The bandwidth is chosen according to Silverman [12], with
, where IQR denotes the interquartile range. The marginal distribution function
can be directly estimated from the marginal density function.
After obtaining the marginal distribution, MCMC-semiBSL uses a Gaussian Copula function to model the dependence structure between the components of the summary statistics. The Gaussian Copula density function is given by:
(9)
where
is the d-dimensional identity matrix,
,
is the inverse of the standard normal distribution, and
, for
. If the correlation matrix
in the above expression can be estimated, the likelihood function can be estimated as:
(10)
MCMC-semiBSL estimates the correlation matrix using the Gaussian rank correlation (GRC) method proposed by Boudt et al. in 2012 [13]. According to the GRC method, the
-th component
of the correlation matrix
can be estimated as:
(11)
where
is the rank function, and
is the inverse of the standard normal distribution. After estimating the likelihood function, similar to MCMC-BSL, the MCMC algorithm with iterations T is used to sample from
. The specific algorithm procedure is as follows:
Algorithm 2: MCMC-semiBSL algorithm |
Input: Summary statistic of the data,
, the prior distribution,
, the proposal
distribution
, the number of iterations, T, the number of simulation runs, n, the
initial value of the chain
. Output: MCMC sample
from the BSL posterior,
. Some
samples can be discarded as burn-in if required. 1: Simulate
and compute
2: Compute
and
using (8) 3: Compute
using (11) 4: Compute
using (10) 5: for
do 6: Draw
7: Simulate
and compute
8: Compute
and
using (8) 9: Compute
using (11) 10: Compute
using (10) 11: Compute
12: if
then 13: Set
14: else 15: Set
16: end |
2.4. Adaptive Monte Carlo Algorithm
Markov Chain Monte Carlo (MCMC) is a stochastic simulation method based on Markov chains, used to estimate the numerical characteristics of complex probability distributions. The foundational algorithm of MCMC was proposed by W. K. Hastings in 1970, known as the Metropolis-Hastings (M-H) algorithm [14]. In the MCMC-semiBSL method, the M-H algorithm is also used. The basic idea is to generate samples from the target distribution by constructing a Markov sampling chain. The algorithm starts from an initial state and iteratively performs two steps: sampling from the proposal distribution and determining whether to accept the sample, until convergence is reached.
As can be seen from the above algorithm, the M-H algorithm requires the specification of both the proposal distribution and the initial state before execution. The convergence rate of the M-H algorithm depends on the choice of these two factors. When the proposal distribution is poorly chosen or the initial state deviates from the true distribution, the convergence rate of the M-H algorithm can be slow, and it may even become trapped in local optima. To address this limitation of the M-H algorithm, Haario et al. introduced the Adaptive Monte Carlo (AM) algorithm in 2001 [15]. This algorithm continuously adjusts the proposal distribution based on the sampled values during execution, thus accelerating the convergence rate of the algorithm. In the classic M-H algorithm, a Gaussian proposal distribution is typically used as follows:
(12)
where
is an appropriate covariance matrix. The AM algorithm adjusts the covariance matrix
continuously during the execution of the algorithm based on the sampling results. The specific adjustment method is as follows:
Assume that at time
, the state samples drawn by the algorithm are
, where
is the set initial state. The candidate state at the next time step is then drawn from the proposal distribution
:
(13)
(14)
(15)
(16)
where
is the d-dimensional identity matrix,
is a small constant chosen by the user, and
is a scaling parameter that depends only on the dimensionality
. In this paper,
is set to
as suggested by Gelman et al. [16].
Theorem 1. Let
be the density of a target distribution supported on a bounded measurable subset
, and assume that
is bounded from above. Let
and let
be any initial distribution on
. Then the AM chain
simulates properly the target distribution
: for any bounded and measurable function
, the equality
(17)
holds almost surely.
Theorem 2. Assume that the finite-dimensional distributions of the stochastic process
on the state space
, where the sequence of generalized transition probabilities
is assumed to satisfy the following three conditions:
(i) There are a fixed integer
and a constant
such that
(18)
(ii) There are a fixed probability measure
on
and a constant
such that
(19)
(iii) We have the following estimate for the operator norm
(20)
where
is a fixed positive constant,
and one assumes that
is a direct continuation of
.
Then, if
is bounded and measurable, then the equality (17) holds almost surely.
In what follows, the auxiliary constants
depend on
or
, and their actual value is irrelevant to our purposes here.
Proof of Theorem 1. According to Theorem 2, it suffices to prove that the AM chain satisfies conditions (i)-(iii). In order to check condition (i), we observe that, directly from definition and by the fact that
is bounded, all the covariances
satisfy the matrix inequality
(21)
Hence the corresponding normal densities
are uniformly bounded from below on
for all
.
We next verify condition (iii). To that end, we assume that
and observe that, for given
, one has
(22)
Fix
and introduce
together with
. According to the definition,
(23)
In general,
, for
. We easily see that
, and hence the previous estimates yield (iii).
In order to check condition (ii), fix
and denote
. It follows that
, where
does not depend on
. We may therefore proceed exactly as in (22) and (23) to deduce that
(24)
Since
, we obtain
(25)
which completes the proof of Theorem 1.
2.5. Transformed Kernel Density Estimation
Traditional kernel density estimation does not perform well for heavy-tail distributions. To address this issue, Wand et al. proposed a kernel density estimation method based on the Box-Cox transformation in 1991 [17]. The basic idea of the method is as follows: First, transform the original data so that the transformed data becomes more uniform. Then, apply traditional kernel density estimation to the transformed data. Finally, the resulting density function is inverted and transformed back to the original data’s kernel density estimate. Based on this concept, this paper extends the transformed kernel density estimation to the framework of BSL.
In 1992, Park et al. [18] demonstrated the effectiveness of this method through numerical experiments. Since then, many studies have proposed different transformation methods, such as the power transformation method proposed by Bolance et al. [19], the Mobius-like mapping proposed by Clements et al. [20], the Champernowne transformation proposed by Buch et al. [21], and the inverse Beta(3,3) transformation proposed by Bolance et al. [22].
Among these, Bolance et al. introduced a quadratic transformation method that combines the Champernowne distribution function and the inverse Beta(3,3) function. This method first approximates the sample data distribution to a uniform distribution using the Champernowne distribution function, and then transforms the data, which has already been transformed once, to approximately follow a Beta(3,3) distribution using the inverse Beta(3,3) distribution function. This method offers a significant advantage over other methods because of the theory proven by Terrell and Scott in 1985 [23], which states that the Beta(3,3) density function minimizes the integrated mean squared error of kernel density estimation within the function space
. The kernel density estimation method adopted in this paper is an improved method of this quadratic transformation—the Beta kernel density estimation under the generalized Logistic transformation. This method requires fewer parameters and improves upon the original method by addressing the “boundary bias” issue. The specific method is as follows:
First, the original data is approximately transformed into a uniform distribution on the interval [0, 1]. To achieve this, the transformation function is set to be the cumulative distribution function (CDF) of the original data. However, the true distribution function is unknown, so a generalized Logistic function is used to estimate the empirical distribution function. The expression of the generalized Logistic function is as follows:
(26)
where
represents the parameter that needs to be estimated.
Next, the sample data
are arranged in ascending order as order statistics
, and the values of the empirical distribution function at
are calculated as
. A regression is performed on the pairs
, and the parameter estimates
are obtained by least squares. Using the generalized Logistic function as an estimate for the distribution function , the original data is transformed as
.
Then, a Beta kernel density estimate is applied to the transformed sample:
(27)
where
is the probability density function of the Beta(p, q) distribution, and
is the bandwidth.
Finally, the kernel density estimate of the transformed data is converted into the kernel density estimate of the original data as follows:
(28)
The AM algorithm and transformed kernel density estimation are incorporated into semiBSL, and the new method is named AM-semiTBSL. The detailed algorithmic process is shown in Algorithm 3.
Algorithm 3: AM-semiTBSL algorithm |
Input: Summary statistic of the data,
, the prior distribution,
, the initial
proposal distribution
, the Initial covariance matrix,
, the number of iterations,
T, the number of simulation runs, n, the initial value of the chain
. Output: AM sample
from the BSL posterior,
. Some samples
can be discarded as burn-in if required. 1: Simulate
and compute
2: Compute
and
using (28) 3: Compute
using (11) 4: Compute
using (10) 5: for
do 6: Draw
7: Compute
using (14) 8: Compute
using (13) 9: Simulate
and compute
10: Compute
and
using (28) 11: Compute
using (11) 12: Compute
using (10) 13: Compute
14: if
then 15: Set
16: else 17: Set
18: end |
3. Numerical Experiment
The performance of MCMC-semiBSL, AM-semiBSL, MCMC-semiTBSL, and AM-semiTBSL is compared through numerical simulations. All numerical simulations are performed on a computer configured with an AMD Ryzen 7 5800H 3.20 GHz processor and 16.0 GB of memory, using Matlab (R2021a).
To compare the estimation performance of different methods, the Total Variation Distance is used to measure the difference between the true posterior distribution and the posterior distribution estimated by the methods. Given two distributions, P and Q, the definition of the Total Variation Distance is as follows:
(29)
where
and
are the probability density functions of distributions P and Q, respectively.
,
when the two distributions are identical, and
when the two distributions do not overlap at all. The Total Variation Distance is symmetric and satisfies the triangle inequality. By comparing the Total Variation Distance between the estimated posterior distribution of different methods and the true posterior distribution, the estimation performance of the different methods can be effectively assessed.
For the stable distributions provided in Section 2.1, the true parameter values are set as
, and the observation data
are generated based on the model and parameters. In this numerical simulation, no dimensionality reduction of the original observed data into summary statistics is performed, and the complete observed data are directly treated as summary statistics. The initial values for the parameters of the four methods are set as
. Three sets of experiments are conducted for each method, with iteration numbers set as: T = 10,000, T = 20,000, and T = 50,000. In each iteration, n = 1000 simulated samples are generated.
Figure 1 shows the comparison of the estimated distributions and the true distribution under the four different iteration numbers, and Table 1 presents the Total Variation Distance between the estimated distributions and the true distribution for each case. The numerical simulation results indicate that: 1) As the number of iterations increases, the estimation accuracy of all four methods improves. 2) The AM algorithm effectively enhances the convergence speed, and under the same number of iterations, the two methods using the AM algorithm generally outperform the methods using MCMC sampling. 3) The use of kernel density estimation based on quadratic transformations can significantly improve the estimation accuracy, with MCMC-semiTBSL outperforming MCMC-semiBSL, and AM-semiTBSL outperforming AM-semiBSL. 4) Among the four methods, AM-semiTBSL achieves the best performance in terms of both accuracy and convergence speed, followed by MCMC-semiTBSL, then AM-semiBSL, and finally MCMC-semiBSL.
Table 1. The total variation distance between the results of different methods and the true distribution.
The number of iterations |
Methods |
MCMC-semiBSL |
AM-semiBSL |
MCMC-semiTBSL |
AM-semiTBSL |
T = 10,000 |
0.3027 |
0.3894 |
0.3427 |
0.1473 |
T = 20,000 |
0.2984 |
0.2829 |
0.2854 |
0.0362 |
T = 50,000 |
0.2115 |
0.2038 |
0.1306 |
0.0151 |
Figure 1. Numerical simulation results of different methods.
4. Empirical Analysis
The real data used in this section is sea clutter data collected by the IPIX radar at McMaster University in Canada. The analysis focuses on the first 10 distance bins from group #269, with a total of 1,310,720 samples.
Table 2 presents the basic statistical summary of the sea clutter sample data, where the skewness is −0.0252, indicating a slight “left skew” in the sample data. The kurtosis is 4.7240, which is greater than 3, indicating that the sample data is “peaked.” This is also reflected in the normal Q-Q plot shown on the right side of Figure 2. These characteristics suggest that the sea clutter data exhibits non-Gaussian noise, making it suitable for modeling using stable distributions.
Table 2. The basic statistics of sea clutter data.
Mean |
Standard deviation |
Skewness |
Kurtosis |
Maximum value |
Minimum value |
2.7798 × 10−11 |
1.0000 |
−0.0252 |
4.7240 |
5.8344 |
−6.7303 |
Figure 2. Time series plot and normal Q-Q plot of sea clutter data.
Stable distributions are used to model the sea clutter sample data, with the model parameters estimated using four methods: MCMC-semiBSL, AM-semiBSL, MCMC-semiTBSL, and AM-semiTBSL. To better compare the performance of the different methods and eliminate external factors, all four methods are initialized with the same parameter values
, iteration number T = 50,000, and model fitting sample size n = 1500. The Total Variation Distance is used to assess the estimation performance of the different methods.
From the results shown in Figure 3 and Table 3, it is evident that modeling radar clutter data using stable distributions is feasible. Furthermore, under the same number of iterations, the AM-semiTBSL method provides the best parameter estimates for the stable distribution. This demonstrates that incorporating the AM algorithm and transformed kernel density estimation effectively enhances the convergence speed and estimation accuracy of MCMC-semiBSL.
Figure 3. The estimated distributions of different methods and the true distribution.
Table 3. The total variation distance between the estimated distributions of different methods and the true distribution.
|
MCMC-semiBSL |
AM-semiBSL |
MCMC-semiTBSL |
AM-semiTBSL |
|
0.3612 |
0.2971 |
0.1639 |
0.0854 |
5. Conclusion
This paper addresses the issue of poor performance of semi-parametric Bayesian synthetic likelihood when handling data with heavy-tail by introducing transformed kernel density estimation. It also tackles the slow convergence of MCMC sampling and the tendency to fall into local optima within the semi-parametric Bayesian synthetic likelihood framework by incorporating the Adaptive Monte Carlo (AM) algorithm. Subsequently, numerical experiments based on stable distributions were conducted. The results show that the transformed kernel density estimation improves the estimation accuracy of the semi-parametric Bayesian synthetic likelihood, while the Adaptive Monte Carlo algorithm enhances its convergence speed. Finally, empirical analysis on a real dataset, the sea clutter data, further confirms the superiority of the proposed improvements.