Short Text Classification Based on Improved ITC


The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selection algorithm based on the characteristics of short text classification while combining the concepts of the Documents Distribution Entropy with the Position Distribution Weight. The improved ITC algorithm conforms to the actual situation of the short text classification. The experimental results show that the performance based on the new algorithm was much better than that based on the traditional TFIDF and ITC.

Li, L. and Qu, S. (2013) Short Text Classification Based on Improved ITC. Journal of Computer and Communications, 1, 22-27. doi: 10.4236/jcc.2013.14004.

Conflicts of Interest

The authors declare no conflicts of interest.


