Wavelet-Based Density Estimation in Presence of Additive Noise under Various Dependence Structures

We study the following model: Y X = +  . The aim is to estimate the distribution of X when only n Y Y 1 , ,  are observed. In the classical model, the distribution of  is assumed to be known, and this is often considered as an important drawback of this simple model. Indeed, in most practical applications, the distribution of the errors cannot be perfectly known. In this paper, the author will construct wavelet estimators and analyze their asymptotic mean integrated squared error for additive noise models under certain dependent conditions, the strong mixing case, the β-mixing case and the ρ-mixing case. Under mild conditions on the family of wavelets, the estimator is shown to be ( ) p L p ∞ 1  -consistent and fast rates of convergence have been established.


Introduction
In practical situations, direct data are not always available.One of the classical models is described as follows: where i X stands for the random samples with unknown density X f and i  denotes the i.i.d.random noise with density g.To estimate the density X f is a deconvolution problem.Among the nonparametric methods of deconvolution, one can find estimation by model selection (e.g.Comte, Rozenhole and Taupin [1]), wavelet thresholding (e.g.[2]), kernel smoothing (e.g.Carroll and Hall, [3]), spline deconvolution or spectral cut-off (e.g.Johannes [4]) and Meister [5] basically on the effect of noise misspecification.However, a problem frequently encountered is that the proposed estimator is not everywhere positive, and therefore is not a valid probability density.
Sometimes, this problem can be circumvented by repeated observations of the same variable of interest, each time with an independent error.This is the model of panel data (see for example Li and Vuong [6], Delaigle, Hall and Meister [7], or Neumann [8] and references therein).On the other hand, there are many application fields where it is not possible to do repeated measurements of the same variable.So, information about the error distribution can be drawn from an additional experiment: a training set is used by experimenters to estimate the noise distribution.Think of  as a measurement error due to the measuring device, then preliminary calibration measures can be obtained in the absence of any signal X (this is often called the instrument line shape of the measuring device).
In this paper, we extend Geng and Wang [21] (Theorems 4.1 and 4.2) for certain dependent.More precisely, we prove that the linear wavelet estimator attains the standard rate of convergence i.e. the optimal one with additive noise for more realistic and standard dependent conditions as plynomial strong mixing dependence, the β-mixing dependence and ρ-mixing dependence.The properties of wavelet basis allow us to apply sharp probabilistic inequalities which improve the performance of the considered linear wavelet estimator.
The organization of the paper is as follows.Assumptions on the model are presented in Section 2. Section 3 is devoted to our linear wavelet estimator and a general result.Applications are set in Section 5, while technical proofs are collected in Section 6.

Estimation Procedure
The Fourier transform of ( ) . Let N be a positive integer.We assume that there exist constants 0 c > and 1 δ > such that, for any x, ( ) ( ) One can easily find an example which is the Laplace density and ( ) , which satisfies (2.1) with 2 δ = .
We consider an orthonormal wavelet basis generated by dilations and translations of a father Daubechies-type wavelet and a mother Daubechies-type wavelet of the family db2N (see [22]) Further details on wavelet theory can be found in Daubechies [22] and Meyer [23].For any 0 j  , we set { } 0, , 2 1 and for j k ∈ Λ , we define φ and ψ as father and mother wavelet: With appropriated treatments at the boundaries, there exists an integer τ such that, for any integer l τ , we have the following wavelet expansion: . Furthermore we consider the following wavelet sequential definition of the Besov balls.We say ( ) with the usual modifications if 1 p = or 1 r = .Note that, for particular choices of , s p and ( ) , s p r r B M contains the classical Holder and Sobolev balls.See, e.g., Meyer [23] and Hardle, Kerkyacharian, Picard and Tsybakov [24].We define the linear wavelet estimator ( ) where Such an estimator is standard in nonparametric estimation via wavelets.For a survey on wavelet linear estimators in various density models, we refer to [25].Note that by Plancherel formula, we have In 1999, Pensky and Vidakovic [26] investigate Meyer wavelet estimation over Sobolev spaces and ( ) 2 L R risk under moderately and severely ill-posed noises.Three years later, Fan and Koo [2] extend those works to Besov spaces, but the given estimator is not computable since it depends on an integral in the frequency domain that cannot be calculated in practice.It should be pointed out that, by using different method, Lounici and Nickl [27] study wavelet optimal estimation over Besov spaces , s B ∞ ∞ and L ∞ risk under both noises.In [3], wavelet optimal estimation is provided over and risk under moderately ill-posed noise.Furthemore in 2014, Li and Liu [28] considered the wavelet estimation for random samples with moderately ill-posed noise.
Our work is related to the paper of Geng and Wang [21], since our estimator is similar and we borrow a useful Lemma from that study.Geng and Wang [21] prove that, under mild conditions on the family of wavelets, the estimators are shown to be ( ) -consistent for additive noise model.We extend thier result to certain class of dependent observation and prove that the mean integrated squred error of linear wavelet estimator developed by [29] attains the standard rate of convergence i.e. the optimal one in the i.i.d.case.

Optimality Results
The main result of the paper is the upper bound for the mean integrated square error of the wavelet estimator We refer to [24] and [30] for a detailed coverage of wavelet theory in statistics.The asymptotic performance of our estimator is evaluated by determining an upper bound of the MISE over Besov balls.It is obtained as sharp as possible and coincides with the one related to the standard i.i.d.framework.( ) , m y y h x y be the joint distribution of ( ) ) Then there exists a constant Naturally, the rate of convergence in Theorem 4.1 is obtained to be as sharp as possible.

Applications
The three following subsections investigate separately the strong mixing case, the ρ-mixing case and the β-mixing case, which occur in a large variety of applications.

Application to the Strong Mixing Dependence
We define the m-th strong mixing coefficient of ( ) V V +  .We say that ( ) V ∈ is strong mixing if and only if lim 0 m m α →∞ = .Applications on strong mixing can be found in [15] [31] and [32].Among various mixing conditions used in the literature, α-mixing has many practical applications.Many stochastic processes and time series are known to be α-mixing.Under certain weak assumptions autoregressive and more generally bilinear time series models are strongly mixing with exponential mixing coefficients.The α-mixing dependence is reasonably weak; it is satisfied by a wide variety of models including Markov chains, GARCH-type models and discretely observed discussions.
Proposition 4.1.Consider the strong mixing case as defined above.Suppose that there exist two constants ( )

Application to the ρ-Mixing Dependence
Let ( ) Y ∈ be a strictly stationary random sequence.For any m Z ∈ , we define the m-the maximal correlation coefficient of ( ) σ −∞ is the σ-algebra generated by the random variables (or vectors) ) Y ∈ be a strictly stationary random sequence.For any m Z ∈ , we define the m-th β-mixing coefficient of ( ) where the supremum is taken over all finite partitions ( ) Full details can be found in e.g.[29] [31] [33] and [34].
Proposition 4.3.Consider the β mixing case as defined above.Furthermore, there exist two constants 0 C > such that, for any integer m,

Proofs
In this section, we investigate the results of Section 3 under the assumptions of Section 4.Moreover, C denotes any constant that does not depend on l, k and n.Proof of Theorem 3.1.Since we set [ ] ( ) Following the lines of Geng and Wang [21], with Plancherel formula, it is easy to say , ˆl k α is the unbiased estimation of 1 , on the other hand, it follows from the stationarity of ( ) where For upper bound of 1 T , one can only consider the change of variables 2 j y x k = − , and we obtain It follows from (5) that 2 2 2 l T Cn δ  (11) Therefore, combining (7) to (11), we obtain ( ) On the other hand, as we define It follows from ( 13) and ( 14) and the assumption on 2 l that ( ) ( ) ( ) Now the proof of Theorem 3.1 is complete.Proof of Proposition 5.1.We apply the Davydov inequality for strongly mixing processes (see [29]); for any ( ) Since we have therefore .
Consider the ρ-mixing case as defined above.Furthermore, there exist two constants ( )