Open Journal of Statistics

Volume 4, Issue 11 (December 2014)

ISSN Print: 2161-718X   ISSN Online: 2161-7198

Google-based Impact Factor: 0.53  Citations  

Probit Normal Correlated Topic Model

HTML  XML Download Download as PDF (Size: 2741KB)  PP. 879-888  
DOI: 10.4236/ojs.2014.411083    3,589 Downloads   4,778 Views  Citations

ABSTRACT

The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far concentrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modeling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our approach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well-known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents.

Share and Cite:

Yu, X. and Fokoué, E. (2014) Probit Normal Correlated Topic Model. Open Journal of Statistics, 4, 879-888. doi: 10.4236/ojs.2014.411083.

Cited by

[1] Robust and Scalable Spectral Topic Modeling for Large Vocabularies
2020
[2] Untitled
2020
[3] Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019
[4] Bayesian Hidden Topic Markov Models
ProQuest Dissertations Publishing, 2017
[5] Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models
2015
[6] Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm (Extended Version)

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.