Limit Distribution of the φ-Divergence Based Change Point Estimator ()
1. Introduction
Many real world data are made up of consecutive regimes that are separated by abrupt changes [1]. Statistical research works have shown that with time, the underlying data generating processes undergo occasional sudden changes [2]. As a result the assumption of stationarity is often too strong and more often violated. Stationarity in the strict sense, implies time-invariance of the distribution underlying the process. The overall behavior of observations can change over time due to internal systemic changes in distribution dynamics or due to external factors. Modeling time series processes using stationary methods to capture their time-evolving dependence aspects will most likely result in a crude approximation as abrupt changes fail to be accounted for [3]. Reviewed literature reveals that the use of one model may not be appropriate to model a non-stationary series and as such various parametric and non-parametric change-point estimation methods have been proposed [4] [5] [6] [7] [8]. However, they are limited in different ways and their suitability depend on the underlying assumptions. A point of interest in all aspects of life would be to detect and estimate this changes as their implication is crucial. A time series data containing change points is assumed to be piece-wise stationary implying that some characteristics of the process change abruptly at unknown points in time. Parametric tests for change point are mainly based on the likelihood ratio statistics and estimation based on the maximum likelihood method whose general results can be found in [4]. In its simplest form, change-point detection is the name given to the problem of estimating the point at which the statistical properties of a sequence of observations change [9]. Changing point problems can be classified as off-line which deals with only a fixed sample or on-line which considers new information as it observed. Off-line change point problems deal with fixed sample sizes which are first observed and then detection and estimation of change points are done. [10] introduced the change point problem within the off-line setting. Since this pioneering work, methodologies used for change point detection have been widely researched on with methods extending to techniques for higher order moments within time series data. Change point analysis methods are applicable in a wide range of fields including but not limited to climate (climate change), quality management, medicine, finance, and genetics.
For a given set of data
a change point is said to occur when there exists a time
such that the statistical properties of
and
are different. If
is known then the two samples only need to be compared. However, if
is unknown then it has to be analyzed through change point analysis that entails both detection and estimation of the change point/change time. The null hypothesis of no change against the alternative that there exists a time when the distribution characteristics of the series changed is then tested. Considering a change in model parameters the problem would be stated as
(1)
where
is unknown and needs to be estimated.
If
then the process distribution has changed and
is referred to as the change point. Assume that there exists
such that
satisfies
(2)
i.e.
is a fraction that divides the data process at the change point and n is the number of observations in a given data set. Then hypothesis 1 can be restated as
(3)
At a given level of significance, if the null hypothesis is rejected, then the process X is said to be locally piecewise-stationary and can be approximated by a sequence of stationary processes that may share certain features such as the general functional form of the distribution. With the assumption that change time is unknown, [4] gives eight limiting conditions that yields the null distribution of the likelihood ratio test statistic as the supremum of a standardized Brownian bridge. [11] applied these results within a non-parametric framework and obtained similar results. [12] apply the likelihood ratio test within a parametric framework on assumption that data are drawn from extreme value distributions. Through assumption of the Von Misses condition, their test statistic weakly converges in distribution to the supremum of a squared standardized Brownian bridge.
The rest of this paper is organized as follows: Section 2 gives an overview of the change point estimator based on a the
-divergence. Section 3 provides key results for the limit distribution of the divergence based change point estimator, Section 4 gives some simulation results and finally Section 5 gives the conclusion.
2. Single Change Point Detection and Estimation
The change point problem is addressed by using a ‘distance’ function between distributions. Given a distance function, a test statistic is constructed to guarantee a distance (≥0) between any two distributions based on a sample size n. Consider a given parametric model
where
is the parameter space defined on a data set of size n. Let
be random variables and have probability densities
with respect to σ-finite measure
with
generating distinct measures if
Definition 2.1 (ϕ-divergence). Let
and
be two probability distributions. Define the ϕ-divergence between the two distributions as
The broader family of f-divergences (
-divergences) take the general form
(4)
where
is the class of all convex functions
,
satisfying
. To avoid indeterminate expressions at any point
, the following assumptions in relation to the functions
involved in the general definition of
-divergence statistics are given in [13].
(5)
Assumption 1. The function
is convex and continuous. The restriction on
is finite, twice continuously differentiable with
,
.
Different choices of
result in many divergences that play important roles in statistics.
hence divergence measures are not distance measures but give some difference between two probability measures hence the term “pseudo-distance”. More generally a divergence measure is a function of two probability density (or distribution) functions, which has non-negative values and takes the value zero only when the two arguments (distributions) are the same. A divergence measure grows larger as two distributions are further apart. Hence, a large divergence implies departure from the null hypothesis.
Based on the divergence 4 a change point estimator can be constructed as;
(6)
To test for the possibility of having a change in distribution of X it is natural to compare the distribution function of the first
observations to that of the last
since the location of the change time is unknown. When
is near the boundary points, an estimation calculated on a correct large number of observations
is compared to an estimation from a small number of observations
. This may result to an erratic behavior of the test statistic due to instability of the estimators of the parameters [6]. [14] provides the following result.
Theorem 1. Suppose that
maximizes the test statistic over
then under the null hypothesis,
(7)
for proof of theorem 1 see [14].
If
is not bounded away from zero and one the test statistic does not converge in distribution. However, fixed critical values can be obtained for increasing sample sizes when
is bounded away from zero and one and yields significant power gains if the change point is in
. Let
be small enough such that
.
Suppose there exists constants
such that the unique maximum likelihood estimates
exist for all
. Then the test statistic is maximized over
such that
(8)
where
and
The trimming parameter is usually taken to satisfy
[15].
Let
be the set of all values over which the test statistic 8 is maximized. A change time
is estimated by the least value of
that maximizes the test statistic 8.
(9)
with
,
being parameter estimates of
and
respectively and that they are dependent on the change point
.
represents the parameter estimates before the change point and
gives the parameter estimates after the change point. The difference between the two estimators
,
give an idea of the difference between the two samples hence departure from the null hypothesis.
3. Main Result
Consider a second order Taylor expansion of
about the true parameter values
.
For
(10)
(11)
This is by assumption 1 and that
is a pdf.
(12)
(13)
(14)
By definition of the Fisher information matrix,
(15)
Equation (10) reduces to
(16)
Further,
(17)
From Equation (17) we obtain
(18)
From 8 and Equations (10)-(18) then the test statistic can be expressed as
(19)
Let,
Then,
(20)
Consider the second and third terms on the RHS.
(21)
for
(22)
The distribution of
is similar to that of
since the second and third terms of 20 are
.
The change point estimator is reduced to a trimmed maximal Wald type test statistic. Consider the following conditions [16] [17].
(C1) Regularity: Interchanges of derivative and integral operations be valid so that,
(23)
(24)
(C2) for
exist almost everywhere such that
where
i.e. the first and second partial derivatives of
with respect to
are bounded by functions with finite integrals.
(C3) for
exist almost everywhere and are such that the information matrix
exists and is positive definite throughout
and is continuous in
. Hence
for some non-singular matrix
A positive definite matrix is always non-singular (determinant ≠ 0) and the determinant is always positive implying that the matrix is invertible i.e.
(variance-covariance matrix).
(C4) There are constants
such that we can find unique
for each
.
(C5) There is an open subset
containing
such that
exist and are continuous in
for all
and
.
(C6) as
(25)
Theorem 2. Under the null hypothesis and that conditions C1-C6 hold then the asymptotic distribution of the test statistic is given by,
(26)
as
,
. where
for
such that
is a p-dimensional Brownian bridge process.
The following results hold for approximation of the distribution function using the inverted Laplace transformation,
(27)
for proof of this see [18]. Rather than considering a fixed trimming value (
) for all sample sizes, the approach of [4] [11] is followed such that the trimming parameter is s function of the sample size n.
Critical Values
At any given level of significance, the asymptotic critical values of the test can then be estimated. Depending on the dimension of the parameter space (d), the critical values can be estimated. For a bi-variate parameter space i.e. d = 2,
, the asymptotic critical values are presented in Table 1 for different sample sizes such that
.
Table 1. Asymptotic critical values.
4. Simulation Study
In this section change point estimation for the generalized Pareto distribution is considered. For any given finite set of data, at least one of the following is likely at any given change point τ: ξ changes by a non-zero quantity; σ changes by a non-zero quantity; both ξ and σ change by non-zero quantities. A simple change point problem can be formulated in the following way;
(28)
where
Assumption: This work assumes that both parameters of the GP distribution (shape and scale) change at the same time.
Let
. The resulting divergence is the Kullback Leibler (KL) divergence. The KL divergence between two GP distributions is given as
(29)
The KL divergence is a function of the parameters of two densities (before and after the change point).
The change point estimator thus becomes
(30)
For the simulation study, the following model is considered under the alternative hypothesis;
(31)
The change point
is fixed at n/2 for
and 1000. For a 5% level of significance the change point is estimated and the results are presented in Figures 1-3. The change point process
takes a hill shape with the peak being observed at the point where the detected change point lies. To estimate the change time, the estimated critical value is superimposed on the graph and the change points taken as the maximum value exceeding the critical bound (critical value at level of significance). The estimated time of change is also superimposed on the respective plots of the time series data.
Figure 1 shows the evolution of the change point test statistic on the left panel and the time series data on the right panel. The graph of hypothesis testing gives
.
. H0 is therefore rejected and it is concluded that a change point exists in the time series data. The largest divergence is estimated at
against the actual true change point
. Figure 2 of hypothesis testing has
.
. H0 is therefore rejected and a change point is declared in the time series. The largest divergence is estimated at
against the actual true change point of
. Figure 3 has the evolution of the change point process on the top panel and the time series data on the bottom panel.
which rejects H0 at
level of significance. The estimated change point is at
against the true change point
.
Figure 1. Change point test process (left panel) and time series data (right panel) for n = 200 and τ = n/2.
Figure 2. Change point test process (left panel) and time series data (right panel) for n = 500 and τ = n/2.
Figure 3. Change point test process (left panel) and time series data (right panel) for n = 1000 and τ = n/2.
The change point estimator is examined when the true change point is no longer located in the middle of the sample size but is fixed towards the boundary points at n/3 for n = 200, 500, 1000. The results are shown in Figures 4-6.
Figure 4 shows the evolution of the change point test statistic on the left panel and the time series data on the right panel. The graph of hypothesis testing gives
. H0 is therefore rejected and it is concluded that a change point exists at the respective level of significance. The largest divergence is estimated at
against the actual true change point
. Figure 5 of hypothesis testing has
. H0 is therefore rejected and a change point is declared. The largest divergence is estimated at
against the actual true change point of
. Figure 6 has
which rejects H0 at 5% level of significance. The estimated change point is at
against the true change point
.
The asymptotic power of the change point test is examined. The most commonly used criteria for checking the optimality of a statistical test involves fixing the false alarm probability (type I error) and maximizing the detection probability (minimizing the type II error). The power of the test at a given level against a particular alternative is defined as the probability of rejecting the null hypothesis when the alternative is actually true.
(32)
For power or sample-size computation, not only the distribution of the test statistic under the null hypothesis needs to be obtained but also its distribution under the alternative hypothesis. This is beyond the scope of this work hence reliance on simulation results.
is estimated for 1000 replicates of simulated data using defined sample size
. The behavior of the test as the change point approaches the data boundary points is analyzed using the power function such that.
Figure 4. Change point test process (left panel) and time series data (right panel) for n = 200 and τ = n/3.
Figure 5. Change point test process (left panel) and time series data (right panel) for n = 500 and τ = n/3.
Figure 6. Change point test process (left panel) and time series data (right panel) for n = 1000 and τ = n/3.
(33)
The results in Table 2 indicate that the change point test is most powerful when the change point is located at the middle of the data set and less powerful
Figure 7. The 95% power function for
.
Table 2. Change point power estimates of the test with n = 500.
when the change point is located towards the boundary points. This behavior is attributed to comparing an estimate computed from a small sample (say the first
observations) to one computed from a larger sample (say the last
observations). Small sample sizes result to erratic behavior of the test statistic due to instability of the parameter estimates. This implies that the test is more likely to incorrectly reject a change point when is it located towards the boundary points of any given data set (This is shown in Figure 7 that is the power function graph).
5. Conclusion
In this work, a divergence based (pseudo-distance) estimator has been used for estimating change in the parameters of any given parametric distribution. The change-point estimator is the first point at which there is maximal sample evidence of a change characterized by maximum divergence exceeding a critical bound. By application of the likelihood standard regularity conditions the distribution of the pseudo-metric based estimator is found to converge to that of a Brownian bridge process on a given interval. The distribution of the pseudo-distance based change point process is found to be similar to that of a maximal trimmed Wald-type test statistic under the null hypothesis of no change point. The distribution does not depend on the choice of the function
and this is therefore applicable within a parametric framework when using other choices of the function
for other statistical divergences. Further work can be done on the theoretical power properties of this particular change point estimator.
Acknowledgements
The first author thanks the Pan-African University Institute of Basic Sciences, Technology and Innovation (PAUSTI) for funding this research.