Tagging Accuracy Analysis on Part-of-Speech Taggers

Semih Yumusak; Erdogan Dogdu; Halife Kodaz

doi:10.4236/jcc.2014.24021

Journal of Computer and Communications > Vol.2 No.4, March 2014

Tagging Accuracy Analysis on Part-of-Speech Taggers

Semih Yumusak, Erdogan Dogdu, Halife Kodaz
Computer Engineering Department, KTO Karatay University, Konya, Turkey.
Computer Engineering Department, Selcuk University, Konya, Turkey.
Computer Engineering Department, TOBB University of Economics and Technology, Ankara, Turkey.
DOI: 10.4236/jcc.2014.24021 PDF HTML 5,351 Downloads 7,622 Views Citations

Abstract

Part of Speech (POS) Tagging can be applied by several tools and several programming languages. This work focuses on the Natural Language Toolkit (NLTK) library in the Python environment and the gold standard corpora installable. The corpora and tagging methods are analyzed and com- pared by using the Python language. Different taggers are analyzed according to their tagging ac- curacies with data from three different corpora. In this study, we have analyzed Brown, Penn Treebank and NPS Chat corpuses. The taggers we have used for the analysis are; default tagger, regex tagger, n-gram taggers. We have applied all taggers to these three corpuses, resultantly we have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of taggers does better if it is correctly ordered. Additionally, we have seen that NPS Chat Corpus gives different accuracy results than the other two corpuses.

Keywords

POS Tagger; Brown Corpus; Penn Treebank Corpus; NPS Chat Corpus

Share and Cite:

Yumusak, S. , Dogdu, E. and Kodaz, H. (2014) Tagging Accuracy Analysis on Part-of-Speech Taggers. Journal of Computer and Communications, 2, 157-162. doi: 10.4236/jcc.2014.24021.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Brants, T. (2006) Part-of-Speech Tagging. Encyclopedia of Language & Linguistics (Second Edition), Elsevier, Oxford, 221-230.
[2]	S?gaard, A. (2010) Simple Semi-Supervised Training of Part-of-Speech Taggers. Proceedings of the ACL 2010 Con- ference Short Papers, Uppsala, 11-16 July 2006, 205-208.
[3]	Das, D. and Petrov, S. (2011) Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. HLT’11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Portland, 19-24 June 2011, 600-609.
[4]	Ittoo, A. and Bouma, G. (2013) Term Extraction from Sparse, Ungrammatical Domain-Specific Documents. Expert Systems with Applications, 40, 2530-2540.
[5]	Demner-Fushman, D., Chapman, W.W. and McDonald, C.J. (2009) What Can Natural Language Processing Do for Clinical Decision Support? Journal of Biomedical Informatics, 42, 760-772.
[6]	Nothman, J., Ringland, N., Radford, W., Murphy, T. and Curran, J.R. (2013) Learning Multilingual Named Entity Recognition from Wikipedia. Arti?cial Intelligence, 194, 151-175.
[7]	Marcus, M.P., Santorini, B. and Marcinkiewicz, M.A. (1993) Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19, pp. 313-330.
[8]	Bird, S., Klein, E. and Loper, E. (2009) Natural Language Processing with Python. OReilly Media, USA.
[9]	NLTK 3.0 Documentation. http://www.nltk.org/
[10]	Brown Corpus Manual. http://icame.uib.no/brown/bcm.html
[11]	The NPS Chat Corpus. http://faculty.nps.edu/cmartell/NPSChat.htm
[12]	Coprus Readers-Tagged Corpora. http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#tagged-corpora

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies