Consistency of the φ -Divergence Based Change Point Estimator

This paper utilizes a change-point estimator based on the φ -divergence. Since we seek a near perfect translation to reality, then locations of parameter change within a finite set of data have to be accounted for since the assumption of stationary model is too restrictive especially for long time series. The estimator is shown to be consistent through asymptotic theory and finally proven through simulations. The estimator is applied to the generalized Pareto distribution to estimate changes in the scale and shape parameters.


Introduction
such that the statistical properties of 1 , , x x τ  and 1 , , n x x τ +  are different. In its simplest form, change-point detection is the name given to the problem of estimating the point at which the statistical properties of a sequence of observations change [2].
The overall behavior of observations can change over time due to internal systemic changes in distribution dynamics or due to external factors. Time series data entail changes in the dependence structure and therefore modelling non-stationary processes using stationary methods to capture their time-evolving dependence aspects will most likely result in a crude approximation as abrupt changes fail to be accounted for [3]. Each change point is an integer between 1 and n − 1 inclusive. The process X is assumed to be piece-wise stationary implying that some characteristics of the process change abruptly at unknown points in time. The corresponding segments are then said to be homogeneous within but each of the subsequent segments is heterogeneous in characteristics. For a parametric model the parameters associated with the i th segment denoted i θ , are assumed to contain changes. Parametric tests for change point are mainly based on the likelihood ratio statistics and estimation based on the maximum likelihood method whose general results can be found in [4]. Detection of change points is critical to statistical inference as a near perfect translation to reality is sought through model selection and parameter estimation. Parametric methods assume models for a given set of empirical data. Within a parametric setting change, points can be attributed to change in the parameters of the underlying data distribution. Generally, change point methods can be compared based on general characteristics and properties such as test size, power of the test or the rate of convergence to estimate the correct number of change point and the change-point locations. Change point problems can be classified as off-line which deals with only a fixed sample or on-line which considers new information as it observed. Off-line change point problems deal with fixed sample sizes which are first observed and then detection and estimation of change points are done. [5] introduced the change point problem within the off-line setting. Since this pioneering work, methodologies used for change point detection have been widely researched on with methods extending to techniques for higher order moments within time series data. Ideally, it is desired to test how many change points are present within a given set of data and to estimate the parameters associated with each segment. If τ is known then the two samples only need to be compared. However, if τ is unknown then it has to be analyzed through change point analysis that entails both detection and estimation of the change point/change time. The null hypothesis of no change against the alternative that there exists a time when the distribution characteristics of the series changed is then tested. Stationarity in the strict sense, implies time-invariance of the distribution underlying the process.
The hypotheses would be stated as: where τ is unknown and needs to be estimated. If n τ < then the process distribution has changed and τ is referred to as the change point. We assume that there exists where n is the number of observations in a given data set. Then hypothesis 2 can be restated as At a given level of significance, if the null hypothesis is rejected, then the process X is said to be locally piecewise-stationary and can be approximated by a sequence of stationary processes that may share certain features such as the general functional form of the distribution F. Many authors such as [6]- [11] have considered both parametric and non-parametric methods of change point detection in time series data. Ideally, change points cannot be assumed to be known in advance hence the need for various methods of detection and estimation.
This paper is organized as follows: Section 2 gives an overview of the change point estimator based on a pseudo-distance measure. Section 3 provides key results for consistency of the estimator. Section 4 provides an application of the change point estimator to the shape and scale parameters of the generalized Pareto distribution. Section 5 gives an application of the estimator and consistency is shown through simulations. Finally 6 provides concluding remarks.

Change Point Estimator
The change point problem is addressed by using a "distance" function between distributions to describe the change. Given a distance function, a test statistic is constructed to guarantee a distance between any two distributions based on a sample size n. Consider a given parametric model : f θ θ ∈ Θ where Θ is the parameter space defined on a data set of size n. Let 1 , , n X X  be random variables and have probability densities ( ) ( ) At any point t = 0, to avoid indeterminate expressions [12] gives the following assumptions in relation to the functions φ involved in the general definition of φ -divergence statistics, These assumptions ensure the existence of the integrals. Different choices of φ result in many divergences that play important roles in statistics including the hence divergence measures are not distance measures but give some difference between two probability measures hence the term "pseudo-distance". More generally a divergence measure is a function of two probability density (or distribution) functions, which has non-negative values and takes the value zero only when the two arguments (distributions) are the same. A divergence measure grows larger as two distributions are further apart.
Hence, a large divergence implies departure from the null hypothesis.
Generally, a change point problem's objective would be to propose an estimator for the possible change-point τ given a set of random variables.
Based on the divergence in 5 then a change point estimator can be constructed as; where [ ] To test for the possibility of having a change in distribution of 1 , , n x x  it is natural to compare the distribution function of the first τ observations to that of the last (n − τ) since the location of the change time is unknown. When τ is near the boundary points, say near 1 or near n then we are required to compare an estimation calculated on a correct large number of observations (n − τ) to an estimation from a small number of observations τ. This may result to an erratic behavior of the test statistic [7] due to instability of the estimators of the parameters. If λ is not bounded away from zero and one, then the test statistic does not converge in distribution i.e. the critical values for the test statistic diverge to infinity as n → ∞ to obtain a sequence of level α tests [13]. However, fixed critical values can be obtained for increasing sample sizes when λ is bounded away from zero and one and yields significant power gains if the change point is in Λ.
Let ε > 0 be small enough such that Suppose that λ maximizes the test statistic over [0, 1] then under the null hypothesis, [13]. By this result and for The change-point estimator τ of a change point τ is the point at which there is maximal sample evidence for a change in distributional parameters characterized by maximum divergence. It is estimated by the least value of τ that maximizes the test statistic 9.

Consistency of the Change Point Estimator
A minimal requirement for a good statistical decision rule is its increasing reliability with increasing sample sizes [14]. Let 1 , , n x x  be a sample of fixed size n with the density function ( ) ; L x θ be the likelihood function. It can be shown that by Taylor's theorem under the null hypothesis, the φ , divergence based estimator can be reduced to a two-sample Wald-type test statistic of the form are iid random variables of size n with probability density being the vector of parameters governing the pdf. The likelihood function can be expressed as It is more convenient to work with the logarithm of the likelihood function given by Since the logarithm is a monotone increasing function, maximizing the likelihood function is equivalent to maximizing the log-likelihood function. Introduce the following notations: The following equalities hold as n → ∞ .
For the proof of theorems 3.1 and 3.2 see [15]. Theorem 3.3. Let Hence the proof.
Assume that within a finite set of data a change point τ exists and n → ∞ such that ( )  Consider  ( ) n U τ θ . By Taylor's theorem, Since by the principle of maximum likelihood estimation By inequality 36

Change Point Analysis in the Generalized Pareto Distribution
Definition 4.1. The Generalized Pareto distribution function is defined by; ( ) σ is referred to as the scale parameter characterizes the spread of the distribution and ξ referred to as the tail index/shape parameter determines the tail thickness.
More specifically, given that then the probability density function is; For any given finite set of data, at least one of the following is likely at any given change point ( ) Since change points are unknown in advance, then either of the three hypothesis formulations is likely. Without knowledge on the types of changes con-tained in the time series, the question arises on which testing procedure to use. In most instances hypotheses 46 is tested since it is assumed that both distributional parameters change. Figure 1 shows different GP density plots with a constant scale parameter but varying shape parameters. On the other hand, Figure 2 shows different GP density plots with a both scale and shape parameters varying. If any of the parameters were to change at any given point in time, then the thickness of the general tail distribution would change and this would in turn have an effect of the intensity of extreme values observed.
Assume that X is independently and identically distributed random variables drawn for the generalized Pareto distribution and consider a sample data set We will restrict to the case where 0 ξ > i.e. heavy tailed distributions thereby only considering the first part of the density function with support From the divergence in Equation (5), let ( ) ( ) An application of properties of the generalized Pareto distribution [16], numerical computations and methods of integration the divergence between two generalized Pareto distributions becomes The divergence is a function of the parameters of the two densities.

Simulation Study
The performance of the estimator is examined by considering the effects of the change in sample size. The single change-point estimation problem is considered where the change-point τ is fixed at n/2 for n = 200, 500, 1000.   To check consistency of the estimator, we consider the following: first, we consider data simulated from the GP density with parameters ( ) 1 1,ξ and ( ) 2 3,ξ for the scale and shape respectively before and after the change point. 1000 simulations are carried out to estimate the change point and the results are given in Table 1 and Table 2.

Conclusion
In this paper, a divergence (pseudo-distance) based estimator is used to detect change points within a parametric framework focusing on the generalized Pareto Open Journal of Statistics  Simulation studies also show that the change point estimator is consistent.
, D φ θ θ about the true parameter values 1 2 , θ θ  is similar to that of n W τ .