TITLE:
A Likelihood-Based Multiple Change Point Algorithm for Count Data with Allowance for Over-Dispersion
AUTHORS:
Shalyne Nyambura, Anthony Waititu, Antony Wanjoya, Herbert Imboga
KEYWORDS:
Over-Dispersion, Multiple Changepoint, Binary Segmentation, Likelihood Ratio Test
JOURNAL NAME:
Open Journal of Statistics,
Vol.14 No.5,
October
30,
2024
ABSTRACT: Count data is almost always over-dispersed where the variance exceeds the mean. Several count data models have been proposed by researchers but the problem of over-dispersion still remains unresolved, more so in the context of change point analysis. This study develops a likelihood-based algorithm that detects and estimates multiple change points in a set of count data assumed to follow the Negative Binomial distribution. Discrete change point procedures discussed in literature work well for equi-dispersed data. The new algorithm produces reliable estimates of change points in cases of both equi-dispersed and over-dispersed count data; hence its advantage over other count data change point techniques. The Negative Binomial Multiple Change Point Algorithm was tested using simulated data for different sample sizes and varying positions of change. Changes in the distribution parameters were detected and estimated by conducting a likelihood ratio test on several partitions of data obtained through step-wise recursive binary segmentation. Critical values for the likelihood ratio test were developed and used to check for significance of the maximum likelihood estimates of the change points. The change point algorithm was found to work best for large datasets, though it also works well for small and medium-sized datasets with little to no error in the location of change points. The algorithm correctly detects changes when present and fails to detect changes when change is absent in actual sense. Power analysis of the likelihood ratio test for change was performed through Monte-Carlo simulation in the single change point setting. Sensitivity analysis of the test power showed that likelihood ratio test is the most powerful when the simulated change points are located mid-way through the sample data as opposed to when changes were located in the periphery. Further, the test is more powerful when the change was located three-quarter-way through the sample data compared to when the change point is closer (quarter-way) to the first observation.