Named Entity Recognition for Nepali Text Using Support Vector Machines

Abstract

Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size.

Share and Cite:

Bam, S. and Shahi, T. (2014) Named Entity Recognition for Nepali Text Using Support Vector Machines. Intelligent Information Management, 6, 21-29. doi: 10.4236/iim.2014.62004.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Bindu, M.S and Idicula, S.M. (2011) Named Entity Recognizer employing Multiclass Support Vector Machines for the Development of Question Answering Systems. International Journal of Computer Applications (0975-8887), 25.
[2] Asif, E. and Sivaji, B. (2008) Bengali Named Entity Recognition Using Support Vector Machine. Proceedings of the IJCNLP08 Workshop on NER for South and South East Asian Languages, Hyderabad, 12 January 2008, 51-58.
[3] Wu, Y.C., Fan, T.K., Lee, Y-S. and Yen, S.-J. (2006) Extracting Named Entities Using Support Vector Machines. Springer-Verlag, Berlin.
[4] Asif, E. and Sivaji, B. (2010) Named Entity Recognition Using Appropriate Unlabeled Data. Post-Processing and Voting Informatica, 34, 55-76.
[5] Sobhan, N.V., Pabitra, M. and Ghosh, S.K. (2010) Conditional Random Field Based Named Entity Recognition in Geological Text Sobhana. International Journal of Computer Applications (0975-8887), 1.
[6] Joel, N. (2008) Learning NER from Wikipedia.
[7] Bikel, D.M., Schwartz, R.L. and Weischedel, R.M. (1999) An Algorithm that Learns What’s in a Name. Machine Learning, 34, 211-231. http://dx.doi.org/10.1023/A:1007558221122
[8] Zhou, G. and Su, J. (2002) Named Entity Recognition Using an HMM-Based Chunk Tagger. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’2002), Philadelphia, July 2002, 473-480.
[9] Antonio, T., Rafael, M. and Monica, M. (2008) Named Entity WordNet. In Proceedings of the 6th International Language Resources and Evaluation Conference, 2008.
[10] Yamada, H., Kudo, T. and Matsumoto, Y. (2001) Japanese Named Entity Extraction Using Support Vector Machine. Transactions of IPSJ, 43, 44-53.
[11] Kudo, T. and Matsumoto, Y. (2001) Chunking with Support Vector Machines. Proceedings of NAACL, 200, 192-199
[12] Asif, E. and Sivaji, B. (2010) Named Entity Recognition Using Support Vector Machine: A Language Independent Approach. International Journal of Electrical and Electronics Engineering, 4, 155.
[13] Bam, S., (2013) Support Vector Machine Based Named Entity Recognition for Nepali Text. Masters Dissertation, Central Department of Computer Science and IT, Tribhuvan University, Kirtipur.
[14] Shahi, T.B., (2012) Support Vector Machine Based POS Tagging for Nepali Text. Masters Dissertation, Central Department of Computer Science and IT, Tribhuvan University, Kirtipur.
[15] Joachims, T. (2008) Multi-Class Support Vector Machine. Cornell University, Ithaca.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.