TITLE:
Enhancing BERTopic with Pre-Clustered Knowledge: Reducing Feature Sparsity in Short Text Topic Modeling
AUTHORS:
Qian Wang, Biao Ma
KEYWORDS:
Topic Model, BERTopic, Short Text, Feature Sparsity, Cluster
JOURNAL NAME:
Journal of Data Analysis and Information Processing,
Vol.12 No.4,
November
21,
2024
ABSTRACT: Modeling topics in short texts presents significant challenges due to feature sparsity, particularly when analyzing content generated by large-scale online users. This sparsity can substantially impair semantic capture accuracy. We propose a novel approach that incorporates pre-clustered knowledge into the BERTopic model while reducing the l2 norm for low-frequency words. Our method effectively mitigates feature sparsity during cluster mapping. Empirical evaluation on the StackOverflow dataset demonstrates that our approach outperforms baseline models, achieving superior Macro-F1 scores. These results validate the effectiveness of our proposed feature sparsity reduction technique for short-text topic modeling.