TITLE:
A Hybrid Algorithm for Stemming of Nepali Text
AUTHORS:
Chiranjibi Sitaula
KEYWORDS:
String Similarity; Information Retrieval; Text Mining; Natural Language Processing; Dynamic Programming
JOURNAL NAME:
Intelligent Information Management,
Vol.5 No.4,
July
17,
2013
ABSTRACT:
In this paper, a new context free
stemmer is proposed which consists of the combination of traditional rule based system with string
similarity approach. This algorithm can be called as hybrid algorithm. It is
language dependent algorithm. Context free stemmer means that stemmer which
stems the word that is not based on the context i.e., for every context such rule is applied. After stripping the
words using traditional context free rule based approach, it may over stem or
under stem the inflected words which are overcome by applying string similarity
function of dynamic programming. For measuring the string similarity function,
edit distance is used. The stripped inflected word is compared with the words
stored in a text database available. That word having minimum distance is taken
as the substitution of the stripped inflected word which leads to the stem of
it. The concept of traditional rule based system and corpus based approach is heavily used in this approach.
This algorithm is tested for Nepali Language which is based on Devanagari
Script. The approach has given better result in comparison to traditional rule
based system particularly for Nepali Language only. The total accuracy of
this hybrid algorithm is 70.10% whereas the total accuracy of traditional rule
based system is 68.43%.