The Enhancement of Arabic Stemming by Using Light Stemming and Dictionary-Based Stemming
Yasir Alhanini, Mohd Juzaiddin Ab Aziz
.
DOI: 10.4236/jsea.2011.49060   PDF    HTML     5,090 Downloads   9,286 Views   Citations

Abstract

Word stemming is one of the most important factors that affect the performance of many natural language processing applications such as part of speech tagging, syntactic parsing, machine translation system and information retrieval systems. Computational stemming is an urgent problem for Arabic Natural Language Processing, because Arabic is a highly inflected language. The existing stemmers have ignored the handling of multi-word expressions and identification of Arabic names. We used the enhanced stemming for extracting the stem of Arabic words that is based on light stemming and dictionary-based stemming approach. The enhanced stemmer includes the handling of multiword expressions and the named entity recognition. We have used Arabic corpus that consists of ten documents in order to evaluate the enhanced stemmer. We reported the accuracy values for the enhanced stemmer, light stemmer, and dictionary-based stemmer in each document. The results obtain shows that the average of accuracy in enhanced stemmer on the corpus is 96.29%. The experimental results showed that the enhanced stemmer is better than the light stemmer and dictionary-based stemmer that achieved highest accuracy values.

Share and Cite:

Y. Alhanini and M. Aziz, "The Enhancement of Arabic Stemming by Using Light Stemming and Dictionary-Based Stemming," Journal of Software Engineering and Applications, Vol. 4 No. 9, 2011, pp. 522-526. doi: 10.4236/jsea.2011.49060.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Al. Hajjar, M. Hajjar and K. Zreik, “A New System for Evaluation of Arabic Root Extraction Methods,” Proceedings of the 5th International Conference on Internet and Web Applications and Services, ICIW, Barcelona, Spain, 9-15 May 2010, pp. 506-512.
[2] E. Al-Shammari and J. Lin, “A Novel Arabic Lemmatization Algorithm,” Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, Singapore, 24 July 2008.
[3] B. Al-Salemi and M. J. Ab Aziz, “Statistical Bayesian Learning for Automatic Arabic Text Categorization”, Journal of Computer Science, Vol. 7, No. 1, 2011, pp. 39-45. doi:10.3844/jcssp.2011.39.45
[4] K. R. Beesley and L. Karttunen, “Finite-State Morphology: Xerox Tools and Techniques,” CSLI, Stanford, 2003.
[5] K. Shaalan, M. Magdy and A. Fahmy, “Morphological Analysis of Ill-Formed Arabic Verbs in Intelligent Language Tutoring Framework,” Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference, 19-21 May 2010, pp. 277-282.
[6] M. A. Attia, “An Ambiguity Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks,” Proceedings of the Challenge of Arabic for NLP/MT Conference, The British Computer Society, London, 2006.
[7] A. Boudlal, R. Belahbib, A. Lakhouaja and A. Mazroui, “A Markovian Approach for Arabic Root Extraction,” The International Arab Journal of Information Technology, Vol. 8, No. 1, 2009, pp. 13-20.
[8] M. Sawalha and E. Atwell, “Adapting Language Grammar Rules for Building Morphological Analyzer for Arabic Language,” Proceedings of the Workshop of Morphological Analyzer Experts for Arabic language, organized by Arab League Educational, 2009.
[9] R. Sonbol, N. Ghneim and M. S. Desouki, “Arabic Morphological Analysis: A New Approach. Information and Communication Technologies: From Theory to Applications,” The 3rd International Conference on Information & Communication Technologies: From Theory to Applications, 7-11 April 2008, pp. 1-6.
[10] A. A. Mohd Juzaiddin, A. Fatimah, A. A. Abdul Azim and M. Ramlan, “Pola Grammar Technique to Identify Subject and Predicate in Malaysian Language,” The Second International Joint Conference on Natural Language Processing, 11-13 October 2005, pp. 185-190.
[11] A. M. Saif and M. J. A. Aziz, “An Automatic Collocation Extraction from Arabic Corpus,” Journal of Computer Science, Vol. 7, No. 1, 2011, pp. 6-11.
[12] K. Taghva, R. Elkoury and J. Coombs, “Arabic Stemming without a Root Dictionary,” International Conference on Information Technology: Coding and Computing (ITCC’ 05), 4-6 April 2005, pp. 152-157.
[13] R. Alshalabi, “Pattern-Based Stemmer for Finding Arabic Roots” Asian Network for Scientific Information Technology Journal, Vol. 4, No. 1, 2005. pp. 38-43.
[14] T. Buckwalter, “Issues in Arabic Orthography and Morphology Analysis,” The Workshop on Computational Approaches to Arabic Script-Based Languages, COLING Geneva, 2004, pp. 31-34.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.