^{1}

^{*}

^{1}

Approximate Bayesian Computation (ABC) is a popular sampling method in applications involving intractable likelihood functions. Instead of evaluating the likelihood function, ABC approximates the posterior distribution by a set of accepted samples which are simulated from a generating model. Simulated samples are accepted if the distances between the samples and the observation are smaller than some threshold. The distance is calculated in terms of summary statistics. This paper proposes Local Gradient Kernel Dimension Reduction (LGKDR) to construct low dimensional summary statistics for ABC. The proposed method identifies a sufficient subspace of the original summary statistics by implicitly considering all non-linear transforms therein, and a weighting kernel is used for the concentration of the projections. No strong assumptions are made on the marginal distributions , nor the regression models, permitting usage in a wide range of applications. Experiments are done with simple rejection ABC and sequential Monte Carlo ABC methods. Results are reported as competitive in the former and substantially better in the latter cases in which Monte Carlo errors are compressed as much as possible.

Monte Carlo methods are popular in sampling and inference problems. While the Markov Chain Monte Carlo (MCMC) methods find successes in applications where likelihood functions are known up to an unknown constant, MCMC cannot be used in scenarios where likelihoods are intractable. For these cases, if the problem can be characterized by a generating model, Approximate Bayesian Computation (ABC) can be used. ABC is a Monte Carlo method that approximates the posterior distribution by jointly generating simulated data and parameters and does the sampling based on the distance between the simulated data and the observation, without evaluating the likelihoods. ABC was first introduced in population genetics [

The accuracy of ABC posterior depends on sufficiency of summary statistics and Monte Carlo errors induced in the sampling. Given the generative model p ( y | θ ) of observation y o b s with parameter θ , consider summary statistics s o b s = G s ( y o b s ) and s = G s ( y ) , where G s : Y → S is the mapping from the original sample space Y to low dimensional summary statistics S. The posterior distribution, p ( θ | y o b s ) , is approximated by p ( θ | s o b s ) , which is constructed as p ( θ | y o b s ) = ∫ p A B C ( θ , s | s o b s ) d s , with

p A B C ( θ , s | s o b s ) ∝ p ( θ ) p ( s | θ ) K ( ‖ s − s o b s ‖ / ϵ ) , (1)

where K is a smoothing kernel with bandwidth ϵ . In the case of simple rejection ABC, K is often chosen as an indicator function I ( ‖ s − s o b s ‖ / ϵ ) . If the summary statistics s are sufficient, it can be shown that (1) converges to the posterior p ( θ | s o b s ) as ϵ goes to zero [

As shown above, the sampling is based on the distance between the summary statistics of the simulated sample s and the observation s o b s . Approximation errors are induced by the distance measure and are proportional to the distance threshold ϵ . It is desirable to set ϵ as small as possible, but a small threshold will increase the simulation time. This is a trade-off between the accuracy and the efficiency (simulation time). According to recent results on asymptotic properties of ABC [

A vast body of literature of ABC has been published. Many are devoted to reduce the sampling error by using more advanced sampling methods, from simple Rejection method [

In this paper, we focus on the problem of summary statistics. In early works of ABC, summary statistics are chosen by domain experts in an ad-hoc manner. It is manageable if the dimensionality is small which the model is well understood by the experts. But choosing a set of appropriate summary statistics is much more difficult in complex models. To address this problem, a set of redundant summary statistics are constructed as initial summary statistics; dimension reduction methods are then applied yielding a set of low dimensional summary statistics while persevering the information.

Many dimension reduction methods have been proposed for ABC. Entropy based subset selection [

Semi-automatic ABC [

To provide a principled way of designing the regression function, capturing the higher order non-linearity and realizing an automatic construction of summary statistics, we introduce the kernel based sufficient dimension reduction method as an extension of the linear projection based Semi-automatic ABC. This dimension reduction method is a localized version of gradient based kernel dimension reduction (GKDR) [

The proposed method gives competitive results in comparison with Semi-automatic ABC [

The paper is organized as follows. In Section 2, we review GKDR and introduce its localized modification followed by discussions of computation considerations. In Section 3, we show simulation results for various commonly conducted ABC experiments, and compare the proposed method with the Semi-automatic ABC.

In this section, we review the Gradient based Kernel Dimension Reduction (GKDR) and propose the modified Local GKDR (LGKDR). Discussions are given at the end of this section.

Given observation ( s , θ ) , where s ∈ ℝ m are initial summary statistics and θ ∈ ℝ is the parameter to be estimated in a specific ABC application. Assuming that there is a d-dimensional subspace U ⊂ ℝ d , d < m such that

θ ⊥ s | B T s , (2)

where B = ( β 1 , ⋯ , β d ) is the orthogonal projection matrix from ℝ m to ℝ d . The columns of B spans U and B T B = I d . Condition (2) shows that given B T s , θ is independent of the initial summary statistics s. It is then sufficient to use d dimensional constructed vector z = B T s as the summary statistics. This subspace U is called effective dimension reduction (EDR) space [

Let B = ( β 1 , ⋯ , β d ) ∈ ℝ m × d be the projection matrix to be estimated, and z = B T s . We assume (2) is true and p ( θ | s ) = p ˜ ( θ | z ) . The gradient of the regression function is denoted by ∇ s as

∇ s = ∂ E ( θ | s ) ∂ s = ∂ E ( θ | z ) ∂ s = B ∂ E ( θ | z ) ∂ z (3)

which shows that the gradients are contained in the EDR space. Given the following estimator

M = E [ ∇ s ∇ s T ] = B A B T ,

where

A i j = E [ E ( θ | β i T s ) E ( θ | β j T s ) ] , i , j = 1 , ⋯ , d .

The projection directions β lie in the subspace spanned by the eigenvectors of M. It is then possible to estimate the projection directions using eigenvalue decomposition. In GKDR, the matrix M is estimated by the kernel method described below.

Let Ω be an non-empty set, a real valued kernel k : Ω × Ω → ℝ is called positive definite if ∑ i , j = 1 n c i c j k ( x i ⋅ x j ) ≥ 0 for any x i ∈ Ω and c i ∈ ℝ . Given a positive definite kernel k, there exists a unique reproducing kernel Hilbert space (RKHS) H associated with it such that: (1) k ( ⋅ , x ) spans H; (2) H has the reproducing property [

Given training sample ( s 1 , θ 1 ) , ⋯ , ( s n , θ n ) ,

let k S ( s i , s j ) = exp ( − ‖ s i − s j ‖ 2 / σ S 2 ) and k Θ ( θ i , θ j ) = exp ( − ‖ θ i − θ j ‖ 2 / σ Θ 2 )

be Gaussian kernels defined on ℝ m and ℝ , associated with RKHS H S and H Θ , respectively. With assumptions of boundedness of the conditional expectation E ( θ | S = s ) and the average gradient functional with respect to z, the functional can be estimated using cross-covariance operators defined in RKHS and the consistency of their empirical estimators are guaranteed [

M ^ n ( s i ) = ∇ k S ( s i ) T ( G S + n ϵ n I n ) − 1 G Θ ( G S + n ϵ n I n ) − 1 ∇ k S ( s i ) (4)

where G S and G Θ are Gram matrices k S ( s i , s j ) and k Θ ( θ i , θ j ) , respectively. ∇ k S ∈ ℝ n × m is the derivative of the kernel k S ( ⋅ , s i ) with respect to s i , and ϵ n is a regularization coefficient. This matrix can be viewed as the straight forward extension of covariance matrix in principle component analysis (PCA); the data here are the features in RKHS representing the gradients instead of the gradients in their original real space.

The averaged estimator M ˜ = 1 / n ∑ i = 1 n M ^ n ( s i ) is calculated over the training sample ( s 1 , θ 1 ) , ⋯ , ( s n , θ n ) . Finally, the projection matrix B is estimated by taking d eigenvectors corresponding to the d largest eigenvalues of M ˜ just like in PCA, where d is the dimension of the estimated subspace.

As discussed above, the estimator M ˜ is obtained by averaging over the training sample s i . When applied to ABC, since only one observation sample is available, we propose to generate a set of training data using the generating model and introduce a weighting mechanism to concentrate on the local region around the observation and avoid regions with low probability density.

Given simulated data X 1 , ⋯ , X N and a weight kernel K w : ℝ m → ℝ , we propose the local GKDR estimator

M ˜ = 1 N ∑ i = 1 N K w ( X i ) M ^ ( X i ) (5)

where M ^ is m × m matrix and K w ( X i ) is the corresponding weight. K w ( x ) can be any weighting kernel. In the numerical experiments, a triweight kernel is used, which is written as

K w ( X i ) = ( 1 − u 2 ) 3 1 u < 1 , u = ‖ X i − X o b s ‖ 2 ‖ X t h − X o b s ‖ 2

where 1 u < 1 is the indicator function, and X t h is the threshold value which determines the bandwidth. The normalization term of the triweight kernel is omitted since it does not change the eigenvectors we are estimating. The bandwidth determined by X t h is chosen by empirical experiments and will be described in 0. The Triweight kernel is chosen for its concentration in the central area than other “bell shaped” kernels and works well in our experiments. Other distance metrics could be used instead of squared distance.

The idea of the proposed estimator is similar to the ABC estimator itself. Without the weighting and the concentration, the estimator will be averaged over all X i , regardless of the distribution it is generated from. Since the basic assumption of GKDR is that the response variable Y should come from the same distribution, we cannot expect good result simply using all samples without proper weighting. The form of the estimator is the classic Nadaraya-Watson estimator without normalization.

Description of LGKDR algorithm is given in Algorithm 1. Procedure Generate Sample is the algorithm to generate sample with parameter as input. Procedure LGKDR is the algorithm to calculate matrix M ( X i ) as given in (4) and (5).

Since the dimension reduction procedure is done before the sampling, it works as a pre-processing unit to the main ABC sampling procedure. It can be embodied in any ABC algorithm using different sampling algorithms. In this paper, the rejection sampling method is firstly employed for its simplicity and low computation complexity as a baseline. Further results on Sequential Monte Carlo ABC are also reported to illustrate the advantage of the purposed method.

Algorithm 1. LGKDR.

In these experiments, the distance thresholds are pushed to as small as possible to suppress the Monte Carlo errors and isolate the effects of summary statistics alone.

In some problems, not all summary statistics are necessary for every parameter. For example, in the M/G/1 Queue model, the parameter θ_{3} that controls the distribution of the inter-arrival time are not related to the parameters θ_{1} and θ_{2}, which jointly determine the distribution of the service time. It can be expected that using different sets of summary statistics for θ_{3} with smaller dimensionality would improve the sampling efficiency. To do that, the information that is unrelated to the particular parameter is dropped in the dimensional reduction in exchange of lower dimensionality. The experiments show that better results can be achieved using these settings.

More precisely, LGKDR incorporates information of θ in the calculation of gradient matrix M ˜ . If θ is a vector, the relation of different elements of θ is contained in the gram matrix G θ as in (4). Separate estimations concentrate on the information of the specific parameter rather than the whole vector. As shown in the experiments in Section 3.2, it can construct significantly more informative summary statistics in some problems by means of reducing estimation error.

For Semi-automatic ABC [

In this section, we discuss the parameters for LGKDR. Parameters for the ABC sampling will be discussed in the experiments section.

First, the bandwidth of the weighting kernel affects the accuracy of LGKDR. By selecting a large bandwidth, the weights of directions spread out a larger region around the observation points. A small bandwidth concentrates the weights on the directions estimated close to the observation sample. In our experiments, a bandwidth corresponding to an acceptance rate of approximately 10% gives a good result and is used throughout the experiments. The same parameter is set for the Semi-automatic ABC as well for the similar purpose. A more principled method for choosing bandwidth, like cross validation, could be applied to select the acceptance rate if the corresponding computation complexity is affordable.

The bandwidth of the Gaussian kernels σ S , σ Θ and the regularization parameter ϵ n are crucial to all kernel based methods. The first two determine the function spaces associated with the positive definite kernels and the latter affects the convergence rate (see [

Computational complexity is an important concern of ABC methods. LGKDR requires matrix inversion, solving eigenvalue problems and the cross validation procedure. In this paper, training sample size are fixed to 2 × 10 3 and 10^{4} for LGKDR and Semi-automatic ABC, respectively. Under this setting, the total computational time of LGKDR are about 10 times over the linear regression. We believe that it is a necessary price to pay if the non-linearities between the summary statistics are strong. Being unable to capture these information in dimensional reduction step will induce a poor sampling performance and a biased estimation. Also, although the cross validation procedure takes the majority of computation time in LGKDR, it needs to be performed only once for each problem. Once the parameters are chosen, the computation complexity of LGKDR is comparable to the linear-type algorithms. Overall, the computational complexity depends on both the dimensional reduction step and the sampling step. For complex models like population genetics, sampling is significantly more time consuming than the dimension reduction procedure.

In this section, we investigate three problems to demonstrate the performance of LGKDR. Our method is compared to the classical ABC using initial summary statistics and the Semi-automatic ABC [

The Rejection ABC is described in Algorithm 2 and the SMC ABC is shown in Algorithm 3. The hyper-parameters used in LGKDR are set as discussed in section 2.4. We use a modified code from [

For evaluation of the experiments conducted using rejection ABC, a set of parameters θ j where j ∈ 1, ⋯ , N o b s and the corresponding observation sample Y o b s j are simulated from the prior and the conditional probability p ( Y | θ ) , respectively, and are used as the observations. For each experiment, we fix the total number of simulations N and the number of accepted sample N a c c . The sample used for rejection are then generated and fixed for all three methods. Using this setting, although the randomness of the simulation program is contained in the sample; yet since the sample used for each method is same and fixed, we can ignore the randomness in the simulation program and compare the methods more fairly. Also, by using fixed set of sample, we can accurately set the acceptance rate for each method, which is the most influential parameter for the estimation accuracies. For evaluation, the Mean squared error (MSE) over the accepted parameters θ ^ i j and observation θ j are defined as

M S E j = 1 N a c c ( ∑ i = 1 N a c c ( θ j − θ ^ i j ) 2 )

The Averaged Mean Square Error (AMSE) is then computed as the average over M S E j of each observation pair ( θ j , Y o b s j ) as

A M S E = 1 N o b s ∑ j = 1 N o b s M S E j .

It is used as the benchmark for Rejection ABC. Because of the difference of computation complexity, for fairness of comparison, the acceptance rates are set differently. For LGKDR, the acceptance rate is set to 1%; while for semi-automatic ABC and original ABC, the acceptance rates are set to 0.1%. The training sample and simulated sample are generated from the same prior and remain fixed.

Algorithm 2. Rejection-ABC.

Algorithm 3. Sequential-ABC.

For SMC ABC, to get to as small tolerance as possible, the simulation time is different for different method. AMSE is used as the benchmark for the accuracy of the queue model. In the case of Ricker model, due to the extremely long simulation time, only one observation is used and MSE is used instead in this case. Computation time is reported for both experiments.

Several parameters are necessary in running the simulations in ABC. For Rejection ABC, the total number of samples N and the accepted number of samples N a c c are set before the simulation as mentioned above. For Semi-automatic ABC and LGKDR, a training set needs to be simulated to calculate the projection matrix. For LGKDR, a further testing set is also generated for cross validation purposes. The value of these parameters is reported in the corresponding experiments. The simulation time for generating these sample set are negligible compared to the main ABC, especially in SMC ABC. For LGKDR, another important parameter is the target dimensionality D. There are no theoretically sound methods available to determine the intrinsic dimensionality of the initial summary statistics. In practice, since the projection matrix is simply the extracted eigenvectors of the matrix M as in (5) ordered by the absolute value of the corresponding eigenvalues, the dimensionality is just the number of the eigenvectors been used. In our experiments, we run several rejection ABC procedures using different B on a small fixed test set, and then fix the dimensionality. Since the test set is fixed and the different projection matrices are directly accessible, this procedure is very fast. A starting point can be set by preserving 70% of the largest eigenvalues in magnitude and it usually works well. There are a large collection of literatures on how to choose the number of principle components in PCA, which is similar to our problem, for example, see [

Analysis of population genetics is often based on the coalescent model [

100 chromosomes are sampled from a constant population ( N = 10000 ) . The summary statistics are defined using the spectrum of the numbers of segregating sites, s s f s , which is a coarse-grained spectrum consisting of 7 bins based on the Sturges formula ( 1 + log 2 S s e g ) . The frequencies were binned as follows: 0% - 8%, 8% - 16%, 16% - 24%, 24% - 32%, 32% - 40%, 40% - 48% and 48% - 100%, we use the uniform distribution θ ~ [ 0,30 ] in this study rather than the log-normal distribution in [

We test 3 typical scaled mutation rates 5, 8 and 10 rather than random draws from the prior. The results are averaged over 3 tests. A total number of 10^{6} sample is generated; 10^{5} sample is generated as the training sample for LGKDR and Semi-automatic ABC. Different acceptance rates are set for different methods as discussed above. We use s s f s as the summary statistics for both Semi-automatic ABC and LGKDR. Local linear regression is used as the regression function for the former. In LGKDR, the dimension is set to 2.

As shown in

Method | mutation rate θ |
---|---|

ABC | 1.94 |

Semi-automatic ABC | 1.62 |

LGKDR | 1.66 |

ABC improve over original ABC method. LGKDR and Semi-automatic ABC achieve very similar results suggesting that the linear construction of summary statistics is sufficient for this particular experiment.

The M/G/1 model is a stochastic queuing model that follows the first-come-first-serve principle. The arrival of customers follows a Poisson process with intensity parameter λ. The service time for each customer follows an arbitrary distribution with fixed mean (G), and there is a single server (1). This model has an intractable likelihood function because of its iterative nature. However a simulation model with parameter ( λ , μ ) can be easily implemented to simulate the model. It has been analyzed by ABC using various different dimension reduction methods as in [

The generative model of the M/G/1 model is specified by

Y n = { U n if ∑ i = 1 n W i ≤ ∑ i = 1 n − 1 Y i U n + ∑ i = 1 n W i − ∑ i = 1 n − 1 Y if ∑ i = 1 n W i > ∑ i = 1 n − 1 Y i i

where Y n is the inter-departure time, U n is the service time for the nth customer, and W i is the inter-arrival time. The service time is uniformly distributed in interval [ θ 1 , θ 2 ] . The inter-arrival time follows an exponential distribution with rate θ 3 . These configurations stay the same as [

For the rejection ABC, we simulate a set of 30 pairs of ( θ 1 , θ 2 , θ 3 ) but avoid boundary values. They are used as the true parameters to be estimated. The total number of 10^{6} sample are generated. The posterior mean is estimated using the empirical mean of the accepted sample. The simulated sample is fixed across different methods for comparison.

We use the quantiles of the sorted inter-departure time Y n as the exploration variable of the regression model f ( y ) as in [^{4}. Local linear regression is used rather than a simple linear regression for better results. For LGKDR, we use the same quantiles as initial summary statistics for dimension reduction as in Semi-automatic ABC. The number of accepted training sample is 2 × 10 3 in for the LGKDR. The dimension is manually set to 4, as small as the performance is not degraded.

The experimental results of Rejection ABC are shown in _{1}, and the following rows are of similar form. Compared to ABC, “Semi-automatic ABC” gives substantial

Method | θ_{1} | θ_{2} | θ_{3} |
---|---|---|---|

ABC | 0.2584 | 0.5113 | 0.0019 |

Semi-automatic ABC | 0.0112 | 0.5279 | 0.0024 |

LGKDR | 0.0623 | 0.2259 | 0.0023 |

LGKDR (focus 1) | 0.0082 | 5.0656 | 0.0031 |

LGKDR (focus 2) | 0.3942 | 0.2514 | 0.0020 |

LGKDR (focus 3) | 0.2229 | 3.4958 | 0.0020 |

improvement on the estimation of θ_{1}; the other parameters show similar or slightly worse results. LGKDR method improves over ABC on θ_{1} and θ_{2}, but the estimation of θ_{1} is not as good as in Semi-automatic ABC. However, after applying separated estimation, θ_{1} presents a substantial improvement compared to Semi-automatic ABC. Separated estimations for θ_{2} and θ_{3} give no improvements. It suggests that the sufficient dimension reduction subspace for θ_{1} is different from the others and a separated estimation of θ_{1} is necessary.

For SMC ABC, a set of 10 pairs of parameters are generated, and the results on SMC and LGKDR are reported. All other settings are same as the rejection ABC. We omit the results of using Semi-automatic ABC since the sequential chain did not converge properly using these summary statistics and the induced errors were too large to be meaningful. In SMC ABC, two experiments are reported: SMC ABC1 and SMC ABC2. The number of particles are set to 2 × 10 4 and 10^{5}, respectively. In LGKDR, the number of particles are set to 2 × 10 4 and the training sample size for the calculation of projection matrix is 2 × 10 3 , accepted from a training set of size 4 × 10 4 . The dimensionality is set to 5. Cross validation is conducted using a test set of size 2 × 10 4 .

Results of SMC ABC are shown in _{1} and θ_{2}, using less time compared to SMC ABC with set E2. The estimation of θ_{3} is worse but the difference is small (0.005). Focusing on θ_{3} produces an estimation as good as in SMC ABC.

Chaotic ecological dynamical systems are difficult for inference due to its dynamic nature and the noises presented in both the observations and the process. Wood [

Method | θ_{1} | θ_{2} | θ_{3} | Total time |
---|---|---|---|---|

SMC ABC 1 | 0.0404 | 0.4928 | 0.0139 | 9.6e+03 |

SMC ABC 2 | 0.0429 | 0.1964 | 0.0054 | 3.3e+04 |

LGKDR | 0.0235 | 0.1605 | 0.0110 | 2.0e+04 (7.78e+3) |

LGKDR (focus 3) | 0.4854 | 0.1383 | 0.0059 | 2.1e+04 (7.85e+3) |

A prototypic ecological model with Richer map is used as the generating model in this experiment. A time course of a population N t is described by

N t + 1 = r N t e − N t + e t (6)

where e t is the independent noise term with variance σ e 2 , and r is the growth rate parameter controlling the model dynamics. A Poisson observation y is made with mean ϕ N t . The parameters to infer are θ = ( log ( r ) , σ e 2 , ϕ ) . The initial state is N 0 = 1 and observations are y 51 , y 52 , ⋯ , y 100 .

The original summary statistics used by Wood [

∑ t = 51 100 1 ( y t = j ) for 1 ≤ j ≤ 4 , logarithm of sample variance, log ( ∑ t = 51 100 y t j )

for 2 ≤ j ≤ 6 and auto-correlation to lag 5. Set E2 further includes time-ordered observation y t , magnitude-ordered observation

y ( t ) , y t 2 , y ( t ) 2 , { log ( 1 + y t ) } , { log ( 1 + y ( t ) ) } ,

time difference Δ y t and magnitude difference Δ y ( t ) . Additional statistics are added to explicitly explore the non-linear relationships of the original summary statistics and are carefully designed.

In Rejection ABC, we use set E0 for ABC without dimension reduction since the dimension of the larger sets induces severely decreased performance. Sets E1 and E2 are used for Semi-automatic ABC as in [^{7} for all the methods and a training sample of size 10^{6}, a test sample of size 10^{5} for LGKDR and Semi-automatic ABC. The values of log ( r ) and ϕ are fixed as in [

The results are shown in

Method | log ( r ) | σ e | ϕ |
---|---|---|---|

ABC(E0) | 0.049 | 0.217 | 0.944 |

Semi-automatic ABC (E2) | 0.056 | 0.246 | 0.936 |

Semi-automatic ABC (E1) | 0.082 | 0.279 | 1.387 |

LGKDR (E0) | 0.043 | 0.241 | 0.984 |

LGKDR (E0, focus1) | 0.043 | 0.221 | 1.221 |

LGKDR (E0, focus2) | 0.068 | 0.200 | 1.234 |

LGKDR (E0, focus3) | 0.047 | 0.211 | 1.007 |

LGKDR (E1) | 0.047 | 0.179 | 0.895 |

LGKDR (E1, focus1) | 0.048 | 0.220 | 1.38 |

LGKDR (E1, focus2) | 0.059 | 0.174 | 2.694 |

LGKDR (E1, focus3) | 0.054 | 0.292 | 0.829 |

using the bigger set E2 is similar to ABC but is substantially worsen with set E1, suggesting that the non-linear information are essential for an accurate estimation in this model. These features are needed to be explicitly designed and incorporated into the regression function for Semi-automatic ABC. LGKDR using summary statistics set E0 gives similar results compared with ABC. Using larger set E1, the accuracy of log ( r ) is slightly worse than using set E0, but the accuracy of σ e and ϕ present substantial improvements. The additional gains of separate constructions of summary statistics in this model are mixed for different parameter, log ( r ) and ϕ show very small improvements but σ e gets improvements in both cases. Overall, we recommend using separate constructions for the potential improvements if the additional computational costs are affordable.

In SMC ABC, we use set E0 for the SMC, E1 for LGKDR and both E1 and E2 for Semi-automatic ABC. Number of particles is set to 5 × 10 3 for all experiments. Other parameters are the same as in Rejection ABC. Only one set of parameter is used and the time of simulation is set to achieve a tolerance which is as small as possible. Simulation time is reported with computational time of LGKDR included. We show several results with different settings of dimensionality in LGKDR to illustrate the influence of that hyper-parameter. As can be observed in the results, if the dimensionality is set too high, the efficiency of the SMC chain is decreased; if it is set too low, more bias are induced in the estimated posterior mean suggesting loss of information in the constructed summary statistics. In this experiment, dimensionality 6 is chosen by counting the number of largest 70% eigenvalues in magnitude as discussed before.

The results are shown in

We proposed the LGKDR algorithm for automatically constructing summary

Method | log ( r ) | σ e | ϕ | Total time |
---|---|---|---|---|

ABC(E0) | 0.001 | 0.003 | 0.430 | 4.0e+5 |

Semi-automatic ABC(E2) | 0.002 | 0.020 | 0.013 | 4.3e+5 |

Semi-automatic ABC(E1) | 0.031 | 0.079 | 0.019 | 1.7e+5 |

LGKDR(Dimensional 3) | 0.024 | 0.131 | 0.779 | 8.6e+4 |

LGKDR(Dimensional 6) | 0.006 | 0.018 | 0.012 | 4.5e+4 |

LGKDR(Dimensional 9) | 0.001 | 0.040 | 0.250 | 2.8e+5 |

statistics in ABC. The proposed method assumes no explicit functional forms of the regression functions and the marginal distributions, and implicitly incorporates higher order moments up to infinity. As long as the initial summary statistics are sufficient, our method can guarantee to find a sufficient subspace with low dimensionality. While the involved computation is more expensive than the simple linear regression used in Semi-automatic ABC, the dimension reduction is conducted as the pre-processing step and the cost may not be dominant in comparison with a computationally demanding sampling procedure during ABC. Another advantage of LGKDR is the avoidance of manually designed features; only initial summary statistics are required. With the parameter selected by the cross validation, construction of low dimensional summary statistics can be performed as in a black box. For complex models in which the initial summary statistics are hard to identify, LGKDR can be applied directly to the raw data and identify the sufficient subspace. We also confirm that construction of different summary statistics for different parameters improves the accuracy significantly.

Sincere thanks to the members of OJS for their professional performance, and special thanks to managing editor Alline Xiao for a rare attitude of high quality.

Zhou, J. and Fukumizu, K. (2018) Local Kernel Dimension Reduction in Approximate Bayesian Computation. Open Journal of Statistics, 8, 479-496. https://doi.org/10.4236/ojs.2018.83031