Estimation of Regression Function for Nonequispaced Samples Based on Warped Wavelets

We consider the problem of estimating an unknown density and its derivatives in a regression setting with random design. Instead of expanding the function on a regular wavelet basis, we expand it on the basis ( ) ( ) G x φ , a warped wavelet basis. We investigate the properties of this new basis and evaluate its asymptotic performance by determining an upper bound of the mean integrated squared error under different dependence structures. We prove that it attains a sharp rate of convergence for a wide class of unknown regression functions.


Introduction
In nonparametric regression, it is often of interest to estimate some functionals of a regression function, such as its derivatives.For example, in the study of growth curves, the first (speed) and second (spurt) derivatives of the height as a function of age are important parameters for study (Muller [1]).Other needs for derivative estimation often arise in nonparametric regressions themselves.For example, in constructing interval estimates for a regression function and kernel bandwidth selection (Ruppert and Wand [2]), estimators of higher order derivatives are employed in estimating the leading bias terms.Suppose n independent variables 1 , , n Y Y ⋅⋅⋅ with ( ) where i X and i ε are independent random variables, i ε assumed to have normal distribution with mean zero and variance 2 1 σ = for simplicity.The i X ′ s have a density g which may be known or unknown, but assumed to be compactly supported on the interval [ ] , I a b = , as well as f.We aim to estimate ( ) d f , that is, the dth derivative of f, for any integer d.
Considerable research has been devoted to the subject of estimation, mainly the kernel methods, see, e.g., [3]- [8], the smoothing splines, and local polynomial methods, see, e.g., [9]- [11].One may also be interested in more traditional approaches to nonparametric regression, mainly fixed-bandwidth kernel methods, orthogonal series methods and linear spline smoothers.These methods are not adaptive.The estimators based on these methods may achieve substantially slower rate of convergence if the smoothness of the underlying regression functions is misspecified.The recent development of wavelet bases based on multiresolution analyses suggests new techniques for nonparametric function estimation.Wavelet analysis plays important roles in both pure and applied mathematics such as signal processing, image compressing, and numerical solutions.The application of wavelet theory to the field of statistical function estimation is pioneered by Donoho and Johnstone.In a series of important papers (see, e.g., [12]- [15]), Donoho and Johnstone and coauthors present a coherent set of procedures that are spatially adaptive and near optimal over a range of function spaces of inhomogeneous smoothness.They enjoy excellent mean squared error properties when are used to estimate functions that are only piecewise smooth and have near optimal convergence rates over large function classes.
Recently a quite different algorithm is developed by Kerkyacharian and Picard [16].The procedure stays very close to the equispaced Donoho and Johnstone's Visushrink procedure, and thus is very simple in its form and in its implementation.Simply, the projection is done on an unusual non-orthonormal basis, called warped wavelet basis.Assuming that g is known but with no boundedness assumptions on it, two new estimators have been introduced based on a warped wavelet basis.The features of this basis consist of a standard wavelet basis and of the definition of G related to the model.Of course, the properties of this basis truely depend on the warping factor G. Such a technique has been already used with success in the framework of nonparametric regression with random design by Kerkyacharian and Picard [16].Recent works on warped wavelet basis in nonparametric statistics can be found in [17]- [20].To the best of our knowledge, only Cai [21] and Petsa and Sapatinas [22] have proposed wavelet estimators for ( ) d f , but defined with a deterministic equidistant design; that is, The consideration of a random design with warped wavelet complicates significantly the problem and no wavelet estimators for derivative of regression function exist in this case.This motivates us to study the case under different dependence structures: the strong mixing case and the ρ-mixing case.Asymptotic mean integrated squared error properties for derivatives of regression function has been explored.In each case, we prove that warped wavelet estimator attains a fast rate of convergence.Another important advantage of the warped basis estimators is that they are near optimal in the minimax sense over a large class of function spaces for a wide variety of design densities, not necessarily bounded above and below as generally required by other wavelet estimators.Basically, the condition on the design refers to the Muckenhoupt weights theory introduced in Muckenhoupt [23].
The rest of the paper is organized as follows.Section 2 describes the warped wavelet basis and nonquispaced procedure.Optimality of the estimators will be presented in Section 3, while Section 4 contains proofs of the main results.

Assumptions
We aim to estimate derivative of regression function when [ ] ( ) from a strictly stationary stochastic process ( ) Y ∈ defined on a probability space ( ) Condition 1.We define the m-th strong mixing coefficient of ( ) Applications on strong mixing can be found in [24]- [26].Among various mixing conditions used in the literature, α-mixing has many practical applications.Many stochastic processes and time series are known to be α-mixing.Under certain weak assumptions autoregressive and more generally bilinear time series models are strongly mixing with exponential mixing coefficients.The α-mixing dependence is reasonably weak; it is satisfied by a wide variety of models including Markov chains, GARCH-type models and discretely observed discussions.
Condition 2. Let ( ) Y ∈ be a strictly stationary random sequence.For any m Z ∈ , we define the m-the maximal correlation coefficient of ( )

Warped Basis and Estimation Framework
Let N be a positive integer.We consider an orthonormal wavelet basis generated by dilations and translations of a father Daubechies-type wavelet and a mother Daubechies type wavelet of the family db2N (see [27]).Further details on wavelet theory can be found in Daubechies [27] and Meyer [28].In particular, mention that  and  have compact supports.For any 0 j ≥ , we set { } 0, , 2 1 and for j k ∈ Λ , we define φ and ψ as father and mother wavelet: With appropriated treatments at the boundaries, there exists an integer τ such that, for any integer l τ ≥ , , we have the following wavelet expansion: . Furthermore we consider the following wavelet sequential definition of the Besov balls.We say ( ) with the usual modifications if 1 p = or 1 r = .Note that, for particular choices of , where the coefficients are and We define the linear wavelet estimator ( ) ( ) where where 0 j is an integer a posteriori.For more on estimating of derivatives of density function see [30] and [31].Kerkyacharian and Picard [16] propose a construction where the unknown function is expanded on a warped basis instead of a regular wavelet basis.Proceeding in such a way, the estimates of the coefficients become more natural.Let us briefly describe the construction of this procedure.Suppose is a known function, continuous and strictly monotone from [ ] It is clear that the above estimator is unbiased and we perform the following warped estimator: In the case where g is unknown, we replace G wherever it appears in the construction by the empirical distribution of the X i 's: Let us define the new empirical wavelet coefficients: Consequently we have the estimator: This approach was initially introduced by Rao [32] for the estimation of the derivatives of a density.Note that, for m = 0 the standard case, this estimator has been considered and studied in Kerkyachariyan and Picard [16].

Optimality Results
The main results of the paper are upper bounds for the mean inegrated square error of the wavelet estimator x , which is defined as usual by ∫ Moreover, C denotes any constant that does not depend on l, k and n. , where .
For upper bound of 1 T , we have Using the same technic as [19] and change of variables 2 l y k x = − , we obtain Considering almost the same integral as in 1 T , and the fact ( ) It follows from (4.2), ( . . , , , , . Using Proposition 6.1 in [33], and the fact that ) ( ) Applying the Davydov inequality for strongly mixing processes (see [34]), for any ( ) , we have ( ) .
Hence by . The rate of convergence corresponds to the one obtained in the derivatives density estimation framework.See, for example, Rao [32] and Chaubey et al. [30] [31].

Conclusion
In this paper, we proposed a wavelet-based estimator for derivatives of regression function in the random design .The proposed estimator was formulated according to the warped basis which was simple and easy for applications .The results successfully revealed that without imposing too restrictive assumptions on the model , the wavelet-based estimator attained a sharp rate of convergence under strong mixing and ρ-mixing structures.
as the σ-algebra generated by the random variables (or vectors) the σ-algebra generated by the random variables (or vectors) 1 , , m m Y Y +  We say that ( ) i i Z Y ∈ is strong mixing if and only if Lim 0 m m α →∞ = .Furthermore, there exict two constants , 0 c γ > such that, for any integer 1 m ≥ , con- tains the classical Holder and Sobolev balls.See, e.g., Meyer[28] and Hardle et al.[29].Now we consider the wavelet basis Γ with 5 N m > and φ and ψ have d derivatives, then the generalized expansion of deri-

with proposition 4 . 1 4 . 3 .Proof of Proposition 4 . 3 .
completes the proof.Proposition Suppose that the assumptions of Condition 2 hold.Let ( ) Having the same technique as in Proposition 4.2, we have

14 )Remark 4 . 1 .
First consider the i.i.d case.Using (4.2) and (4.3) and the fact that( ) 2 lCard Γ = , one can easily have the assumptions of Section 2 hold.Using Proposition 4.2 with Theorem 4.1 shows that, under mild assumptions on the dependence of observations,( )  ˆd f attains a rate of convergence close to the one for the i.i.d.case i.e.,