A HMM-Based System To Diacritize Arabic Text

M. S. Khorsheed

doi:10.4236/jsea.2012.512B024

Journal of Software Engineering and Applications > Vol.5 No.12B, December 2012

A HMM-Based System To Diacritize Arabic Text

M. S. Khorsheed
National Center for Robotics & Intelligent Systems, King & Technology (KACST), POB 6086, Riyadh, Saudi Arabia.
DOI: 10.4236/jsea.2012.512B024 PDF HTML 4,826 Downloads 6,721 Views Citations

Abstract

The Arabic language comes under the category of Semitic languages with an entirely different sentence structure in terms of Natural Language Processing. In such languages, two different words may have identical spelling whereas their pronunciations and meanings are totally different. To remove this ambiguity, special marks are put above or below the spelling characters to determine the correct pronunciation. These marks are called diacritics and the language that uses them is called a diacritized language. This paper presents a system for Arabic language diacritization using Hid- den Markov Models (HMMs). The system employs the renowned HMM Tool Kit (HTK). Each single diacritic is represented as a separate model. The concatenation of output models is coupled with the input character sequence to form the fully diacritized text. The performance of the proposed system is assessed using a data corpus that includes more than 24000 sentences.

Keywords

Arabic; Hidden Markov Models; Text-to-speech; Diacritization

Share and Cite:

M. Khorsheed, "A HMM-Based System To Diacritize Arabic Text," Journal of Software Engineering and Applications, Vol. 5 No. 12B, 2012, pp. 124-127. doi: 10.4236/jsea.2012.512B024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	RDI, “ArabDiac” http://www.rdi-eg.com/rdi/Research.
[2]	SAKHR, http://www.sakhr.com/.
[3]	CIMOS, http://www.cimos.com/.
[4]	A. Farghaly and J. Snellart, “Intuitive coding of the Arabic lexicon,” Louisiana-USA, 23 September 2003.
[5]	S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, “The HTK Book,” Cambridge Univer-sity Engineering Dept., 2001.
[6]	L. Rabiner and B. Juang, “Fundamentals Of Speech Recognition,” Prentice Hall, 1993.

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies