TITLE:
Advantages of Using a Spell Checker in Text Mining Pre-Processes
AUTHORS:
Jhonathan Quillo-Espino, Rosa María Romero-González, Alberto Lara-Guevara
KEYWORDS:
Spell Checker, Text Mining, Stemming, Tokenization, Porter Algorithm, Snowball Algorithm
JOURNAL NAME:
Journal of Computer and Communications,
Vol.6 No.11,
November
14,
2018
ABSTRACT: The aim of this work was the behavior analysis when a spell checker was integrated as an extra pre-process during the first stage of the test mining. Different models were analyzed, choosing the most complete one considering the pre-processes as the initial part of the text mining process. Algorithms for the Spanish language were developed and adapted, as well as for the methodology testing through the analysis of 2363 words. A capable notation for removing special and unwanted characters was created. Execution times of each algorithm were analyzed to test the efficiency of the text mining pre-process with and without orthographic revision. The total time was shorter with the spell-checker than without it. The key difference of this work among the existing related studies is the first time that the spell checker is used in the text mining preprocesses.