TITLE:
Interrater Reliability Estimation via Maximum Likelihood for Gwet’s Chance Agreement Model
AUTHORS:
Alek M. Westover, Tara M. Westover, M. Brandon Westover
KEYWORDS:
Interrater Reliability, Agreement, Reliability, Kappa
JOURNAL NAME:
Open Journal of Statistics,
Vol.14 No.5,
October
28,
2024
ABSTRACT: Interrater reliability (IRR) statistics, like Cohen’s kappa, measure agreement between raters beyond what is expected by chance when classifying items into categories. While Cohen’s kappa has been widely used, it has several limitations, prompting development of Gwet’s agreement statistic, an alternative “kappa”statistic which models chance agreement via an “occasional guessing” model. However, we show that Gwet’s formula for estimating the proportion of agreement due to chance is itself biased for intermediate levels of agreement, despite overcoming limitations of Cohen’s kappa at high and low agreement levels. We derive a maximum likelihood estimator for the occasional guessing model that yields an unbiased estimator of the IRR, which we call the maximum likelihood kappa (
κ
ML
). The key result is that the chance agreement probability under the occasional guessing model is simply equal to the observed rate of disagreement between raters. The
κ
ML
statistic provides a theoretically principled approach to quantifying IRR that addresses limitations of previous
κ
coefficients. Given the widespread use of IRR measures, having an unbiased estimator is important for reliable inference across domains where rater judgments are analyzed.