The Role of Rare Terms in Enhancing the Performance of Polynomial Networks Based Text Categorization

Abstract

In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weighting scheme on each reduced term set was tested once keeping the rare terms and another time removing them. All the experiments conducted in this research show that keeping rare terms substantially improves the performance of Polynomial Networks in Text Categorization, regardless of the term reduction method, the number of terms used in classification, or the term weighting scheme adopted.

Share and Cite:

M. Al-Tahrawi, "The Role of Rare Terms in Enhancing the Performance of Polynomial Networks Based Text Categorization," Journal of Intelligent Learning Systems and Applications, Vol. 5 No. 2, 2013, pp. 84-89. doi: 10.4236/jilsa.2013.52009.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] M. M. AL-Tahrawi and R. Abu Zitar, “Polynomial Networks versus Other Techniques in Text Categorization,” International Journal of Pattern Recognition and Artificial Intelligence, Vol. 22, No. 2, 2008, pp. 295-322. doi:10.1142/S0218001408006247
[2] R. Bekkerman, “Distributional Clustering of Words for Text Categorization,” M.S. Thesis, Israel Institute of Technology, Haifa, 2003.
[3] D. Koller and M. Sahami, “Hierarchically Classifying Documents Using Very Few Words,” The 14th International Conference on Machine Learning (ICML’97), Nashville, July 1997, pp. 170-178.
[4] D. Wang and H. Zhang, “Inverse-Category-Frequency based Supervised Term Weighting Scheme for Text Categorization,” Journal of Information Science and Engineering, 2010.
[5] C. Deisy, M. Gowri, S. Baskar, S. M. A. Kalaiarasi and N. Ramraj, “A Novel Term Weighting Scheme MIDF for Text Categorization,” Journal of Engineering Science and Technology, Vol. 5, No. 1, 2010, pp. 94-107.
[6] P. Schonhofen and A. A. Benczur, “Exploiting Extremely Rare Terms in Text Categorization,” Lecture Notes in Computer Science, Vol. 4212, 2006, pp. 759-766.
[7] K. Fukunaga, “Introduction to Statistical Pattern Recognition,” Academic Press, Cambridge, 1990.
[8] W. M. Campbell, K. T. Assaleh and C. C. Broun, “A Novel Algorithm for Training Polynomial Networks,” International NAISO Symposium on Information Science Innovations ISI’2001, Dubai, March 2001.
[9] K. T. Assaleh and M. AL Rousan, “A New Method for Arabic Sign Language Recognition,” Personal Communications, 2004.
[10] W. M. Campbell and C. C. Boun, “Using Polynomial Networks for Speech Recognition,” Personal Communications, 2004.
[11] W. M. Campbell and K. T. Assaleh, “Polynomial Classifier Techniques for Speaker verification,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, 15-19 March 1999, pp. 321 324.
[12] K. T. Assaleh and W. M. Campbell, “Speaker Identification Using a Polynomial-Based Classifier,” International Symposium on Signal Processing and Its Applications, Brisbane, 22-25 August 1999, pp. 115-118.
[13] G. H. Golub and C. F. Van Loan, “Matrix Computations,” John Hopkins, Washington DC, 1989.
[14] Ana Site for Data Sets Suitable for Single-Label Text Categorization. http://www.gia.ist.utl.pt/~acardoso/datasets/
[15] M. F. Porter, “An Algorithm for Suffix Stripping,” Program, Vol. 14, No. 3, 1980, pp. 130-137. doi:10.1108/eb046814
[16] G. Forman, “An Extensive Empirical Study of Term Se lection Metrics for Text Classification,” Journal of Ma chine Learning Research, Vol. 3, 2003, pp. 1289-1305.
[17] Y. Yang and J. Pederson, “A Comparative Study on Term Selection in Text Categorization,” Proceedings of the 14th International Conference on Machine Learning, 1997, pp. 412-420.
[18] K. Fuka and R. Hanka, “Feature Set Reduction for Document Classification Problems,” IJCAI-01 Workshop: Text Learning: Beyond Supervision, Seattle, August 2001, 2001.
[19] M. Rogati and Y. Yang, “High-Performing Feature Selection for Text Classification,” CIKM’02, November 2002, pp. 4-9.
[20] Z. Zheng, X. Wu and R. Srihari, “Term Selection for Text Categorization on Imbalanced Data,” SIGKDD Explorations, Vol. 6, No. 1, 2004, pp. 80-89. doi:10.1145/1007730.1007741

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.