TITLE:
Tagging Accuracy Analysis on Part-of-Speech Taggers
AUTHORS:
Semih Yumusak, Erdogan Dogdu, Halife Kodaz
KEYWORDS:
POS Tagger; Brown Corpus; Penn Treebank Corpus; NPS Chat Corpus
JOURNAL NAME:
Journal of Computer and Communications,
Vol.2 No.4,
March
18,
2014
ABSTRACT:
Part of Speech (POS) Tagging can be applied
by several tools and several programming languages. This work focuses on the
Natural Language Toolkit (NLTK) library in the Python environment and the gold
standard corpora installable. The corpora and tagging methods are analyzed and
com- pared by using the Python language. Different taggers are analyzed
according to their tagging ac- curacies with data from three different corpora.
In this study, we have analyzed Brown, Penn Treebank and NPS Chat corpuses. The
taggers we have used for the analysis are; default tagger, regex tagger, n-gram
taggers. We have applied all taggers to these three corpuses, resultantly we
have shown that whereas Unigram tagger does the best tagging in all corpora,
the combination of taggers does better if it is correctly ordered. Additionally,
we have seen that NPS Chat Corpus gives different accuracy results than the
other two corpuses.