Change-Point Detection for General Nonparametric Regression Models

A number of statistical tests are proposed for the purpose of change-point detection in a general nonparametric regression model under mild conditions. New proofs are given to prove the weak convergence of the underlying processes which assume remove the stringent condition of bounded total variation of the regression function and need only second moments. Since many quantities, such as the regression function, the distribution of the covariates and the distribution of the errors, are unspecified, the results are not distribution-free. A weighted bootstrap approach is proposed to approximate the limiting distributions. Results of a simulation study for this paper show good performance for moderate samples sizes.


Introduction
Model building is a very important task.When considering observations taken over time we may wish to consider the relationship between the X i and the Y i by some regression model, either parametric or semiparametric, linear or nonlinear.However, if, after some point i = k, there is a change in the relationship between X i and Y i , then a single regression model will be inappropriate and would fit the data poorly.For example, in Figure 1 below, the scatterplot on the left is that of a combined sample of 150 points.The middle scatterplot is that of the first 60 points while the right plot is that of the remaining 90 points.Without realizing that there was a change-point, one might wrongly propose a single regression model, perhaps linear, with heteroscedastic errors.Thus, it is important to determine whether such a change has occurred before the particular form of the model is postulated.If such a change has occurred, then two regression models should be used, one for the first k pairs and another for the remaining pairs.We will propose a number of tests designed to detect whether a change has occurred in a general nonparametric model.Change-point problems arise in many areas when observations (X i , Y i ) are taken over time i.
For example, the height of (or discharge from) a river as a function of upstream precipitation, or concentrations of heavy metals in storm-water runoff as a function of suspended sediment size.We refer to the book by Csörgő and Horváth [1] for more applications and important references and to the papers [2][3][4] for other change-point problems.
We say that the pair (X, Y) satisfies general regression model M(ϕ, ψ, F, ε) if  , where ε is a mean zero, variance one random variable, independent of X.We write (X, Y)  M(ϕ, ψ, F, ε).
For a sequence    , consider the null hypotheses: where ϕ, ψ > 0, and F are unspecified, the X i are independent and the ε i are independent and identically distributed.Model (1) encompasses both linear and nonlinear regression models and the case of heteroscedastic errors.We wish to test whether there is a change in the model as the observations are taken, without specifying ϕ, ψ or F, that is to test the null Hypothesis (1) versus the alternative: for some , where [s] denotes the integer part of s.
for some d x R  , using the usual partial order in .
d R The alternative Hypothesis (2) includes the case where only ϕ changes, that is, 1 , , for some , and when only F changes , for some .

 
We will base our tests on the process and zero otherwise, where β n can also be written as the difference of two means advantage of β n is that we can use empirical process the-e next section, we state the main results related to th

Main Results
X i } and {ε i }, are inde {X i } are independent and identically distribut (4) where We are considering the partial sums of the response variables Y i aligned according to the X i -observations.The ory which has much better convergence properties, rather than trying to estimate the function ϕ directly with kernel or other curve estimation methods which require a large sample size.As will be demonstrated in the simulation study, our approach works well for moderate sample sizes.
In th e weak convergence of the above process as well as convergence of various related statistics.These results do not assume the stringent condition of bounded variation for the regression function ϕ and the variance function ψ and strong moment conditions on the error distribution, which were assumed in an earlier paper [5].Section 3 states the asymptotic results for the weighted bootstrap version of our statistics.This is needed since the asymptotic distributions of our statistics depend on unspecified functions.A new simulation study is conducted in Section 4. Proofs of the results (different from those in [5,6]) are sketched in Section 5.

Assume the following:
(A1) The sequences { Conditions (A1)-(A5) are much weaker than those in [5] and [6] where ϕ and ψ needed to be of bounded total variation and ε i required much higher moment conditions ( E p i    ).While the conclusions in [5] and [6] may be (weighted approximations), the requirement of bounded variation for ϕ and ψ limits their applicability.stronger Here we dispense with this requirement.For example, we permit the case where the X i have unbounded support and the regression relationship is polynomial (unbounded), provided X i has finite moments of a suitable degree. For The first main result is: umptions (A1)-(A5) and the nu Theorem 1 Under Ass ll Hypothesis (1), n , as , , equipped with the uniform norm, where and where Under the conditions of Theorem 1 and the null Hypothesis (1), where Corollary 2 follows immediately from Theorem 1.

Weighted Bootstrap Approximations
ll as , independent sequences of independ nt and identi-Since the distribution functions of the X i and ε i , as we the functional forms of ϕ and ψ, are not specified, the limiting distributions of all our test statistics will not be known.A weighted bootstrap is proposed here which will approximate these asymptotic distributions.Let , be m e cally distributed random variables, independent of the {ε i } and {X i } sequences.We will consider the processes herwise, where  , and for each is a mean-zero Gaussian process.d wi the uniform norm j, where 11  

Simulations
are omnibus tests in that we do not The proposed tests specify the form of the regression function ϕ.In order to examine how these tests perform, we conducted simulations for two quite different functions, ϕ(x) = 1 + x + βx 2 and ϕ(x) = 2 sin(βx).The X i were generated from the uniform U(−3, 3) distribution and the errors ε i were standard normals.In each case, 1000 simulations were carried out, each with m = 1000 weighted bootstrap samples.The level of the tests was α = 0.05 throughout.Theorem 3 allows us to choose any mean-zero, variance-one distribution for the bootstrap weights ) ( j i V .We chose standard normal weights for the simulat since the limiting processes are Gaussian and normal bootstrap weights have been found to work well in simulations for goodness-of-fit types of statistics [7,8].We use the percentile method.If h(β n ) is one of the statistics in Corollary 2, we reject the null hypothesis where ψ does not depend on X i and the change-point is at e rows correspond to the empirical level of 0.4n.The statistics KS n , KC n and MC n are defined by Equation (6).The first thre the tests under the null hypothesis (β = 0.4) with different choices for ψ, the standard deviation.The breakpoint θ equals 0.4.The empirical level results are reasonably good especially when n > 50, that is they are close to the level of 0.05.The empirical powers of the tests are in the subsequent rows for different β.
As can be seen in Table 1, the power increases as β ge the second simulation (see Table 2), we consider ϕ to ts farther from 0.4 (the null Hypothesis).Understandably the power increases in n and decreases in ψ gets larger.
In be a sine function.We conduct our simulations for sample sizes n = 50, 100, 150.The first row of Table 2 is the empirical level of the test (β = 1, no change-point).Again the empirical level is good when n > 50.
We take ψ = 1, but vary the location of the change-po us in nature, the po

Proofs of the Main Results
int θn.The power increases as β gets farther from 1 (the null hypothesis).The power increases with the sample size n.Except in a few instances, the power is greatest when the change-point occurs in the middle (θ = 0.5) and decreases as θ is farther from 0.5.The exceptional cases are likely due to simulation error.
Considering that our tests are omnib wer results can be seen to be quite good.
= 100 verges in distribution to a vector , , converges in distribution to Г 0 with respect to the metric, where We obtain β n → D Г, as n→∞, where and has covariance given in the statement of Theorem 1. Hence Theorem 1 is proven.By viewing α 2n (x, [ns]) as a strong martingale and using the functional central limit theorem for strong martin o gales by Ivan ff [13], one can drop the assumption (A5) and get convergence in the Shorohod topology.(see [14]).

 
By the multivariate CLT, along almost all sequences, is re a sum of iid random variables with mean zero and a sulting covariance matrix having entries of the form Similarly, the corresponding conditional c variance of o and the corresponding conditional covariance of x ns  converges uniformly to that of Г. ualities ( [12], Lemma 3.6.7),By the multiplier ineq denotes (uniform) measurable cove with respect to all variables jointly.Using Theorem e proof 3.6.13 in [12],  n (x, 1) is Donsker.It follows that ( [12], Theorem 2.12.2)  x, s) is Donsker, and Theorem 4 for  (j) follows.s rs 1, as in th of Theorem n ( Proof of Theorem 5: The proof is similar to that of Theorem 4 but where the limiting covariance of is that of  11 instead of Г.

Concluding Remark
In this paper, we have constructed test statistics using the lower orthants.It is clear (see sker classes of functions can be used.E [12]) that other Donxamples are: the of half-spaces.The method of proof in handle such classes.One can also ob ined weak con- collection of the closed balls, or the collection [5,6,15] could ta not vergence of the weighted version of the process where the weight functions satisfy the Chibisov-O'Reilly conditions (see [5,6], or [16], p. 462).
The simulations show that our tests have good power considering the omnibus nature of the procedure.By avoiding curve estimation (e.g. using kernel methods to estimate ), we are able to provide practical tests for moderate sample sizes.
the supremum of the squared-integral distance.der the alternative Hypothesis (2), when s < θ, the first part of Expression (4) has mean d s > θ. (see inequality (3)).

 5
is a mean-zero Gaussian process.Under the conditions of Theorem 4, the same limiting distributions as found in Corollary 2 for statistics based on β n are obtained for each   j n  .Moreover, the statistics based on β n and   is also important for the bootstrap process to have a em If the conditions of Theorem 4 hold for the lim functio iting distribution under the alternative hypothesis.We obtain: Theor ns ϕ 1 , F 1 , then under the alternative Hypothesis (2), along almost all sequences   3).Under the alternative hysis, c n (α)→ c 11 (α), with c 11 (α), the (1 − α)100-th percentile of   11 h  and h(β n ) → P ∞ (

holds for the functions and F 1
4: Let here Г is a zero-mean Gaussian process.Strong consistency of the statistics n KS, n KC, n MC follows.(See the proof of Theorem 4.1 in[5]).