url = url + "?" + args + "&rand=" + RndNum(4); // //// window.setTimeout("show('" + url + "')", 500); // } // function pdfdownloadjudge() { // $("a").each(function(index) { // var rel = $(this).attr("rel"); // if (rel == "true") { // $(this).removeAttr("onclick"); // $(this).attr("href","#"); // //$(this).bind('click', function() { SetNumTwo(48303)}); // var url = "../userInformation/PDFLogin.aspx"; // var refererrurl = document.referrer; // var downloadurl = window.location.href; // var args = "PaperID=" + 48303 + "&RefererUrl=" + refererrurl + "&DownloadUrl=" + downloadurl; // url = url + "?" + args + "&rand=" + RndNum(4); // // $(this).bind('click', function() { ShowTwo(url)}); // } // }); // } // //获取下载pdf注册的cookie // function getcookie() { // var cookieName = "pdfddcookie"; // var cookieValue = null; //返回cookie的value值 // if (document.cookie != null && document.cookie != '') { // var cookies = document.cookie.split(';'); //将获得的所有cookie切割成数组 // for (var i = 0; i < cookies.length; i++) { // var cookie = cookies[i]; //得到某下标的cookies数组 // if (cookie.substring(0, cookieName.length + 2).trim() == cookieName.trim() + "=") {//如果存在该cookie的话就将cookie的值拿出来 // cookieValue = cookie.substring(cookieName.length + 2, cookie.length); // break // } // } // } // if (cookieValue != "" && cookieValue != null) {//如果存在指定的cookie值 // return false; // } // else { // // return true; // } // } // function ShowTwo(webUrl){ // alert("22"); // $.funkyUI({url:webUrl,css:{width:"600",height:"500"}}); // } //window.onload = pdfdownloadjudge;
JSIP> Vol.5 No.3, August 2014
Share This Article:
Cite This Paper >>

An Intonation Speech Synthesis Model for Indonesian Using Pitch Pattern and Phrase Identification

Abstract Full-Text HTML XML Download Download as PDF (Size:1171KB) PP. 80-88
DOI: 10.4236/jsip.2014.53010    2,827 Downloads   3,587 Views   Citations
Author(s)    Leave a comment
Yohanes Suyanto, Subanar  , Agus Harjoko, Sri Hartati

Affiliation(s)

Department of Computer Science and Electronics, Gadjah Mada University, Yogyakarta, Indonesia.

ABSTRACT

Prosody in speech synthesis systems (text-to-speech) is a determinant of tone, duration, and loudness of speech sound. Intonation is a part of prosody which determines the speech tone. In Indonesian, intonation is determined by the structure of sentences, types of sentences, and also the position of the word in a sentence. In this study, a model of speech synthesis that focuses on its intonation is proposed. The speech intonation is determined by sentence structure, intonation patterns of the example sentences, and general rules of Indonesian pronunciation. The model receives texts and intonation patterns as inputs. Based on the general principle of Indonesian pronunciation, a prosody file was made. Based on input text, sentence structure is determined and then interval among parts of a sentence (phrase) can be determined. These intervals are used to correct the duration of the initial prosody file. Furthermore, the frequencies in prosody file were corrected using intonation patterns. The final result is prosody file that can be pronounced by speech engine application. Experiment results of studies using the original voice of radio news announcer and the speech synthesis show that the peaks of F0 are determined by general rules or intonation patterns which are dominant. Similarity test with the PESQ method shows that the result of the synthesis is 1.18 at MOS-LQO scale.

KEYWORDS

Speech Synthesis, PESQ, Intonation, Indonesian

Cite this paper

Suyanto, Y. ,  , S. , Harjoko, A. and Hartati, S. (2014) An Intonation Speech Synthesis Model for Indonesian Using Pitch Pattern and Phrase Identification. Journal of Signal and Information Processing, 5, 80-88. doi: 10.4236/jsip.2014.53010.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Schroder, M. (2001) Emotional Speech Synthesis: A Review. Proceedings of Eurospeech 2001, 1, 561-564.
[2] Vroomen, J., Collier, R. and Mozziconacci, S. (1993) Duration and Intonation in Emotional Speech. Proceedings of the Third European Conference on Speech Communication and Technology, Berlin, 22-25 September 1993, 577-580.
[3] Campbell, W.N., Isard, S., Monaghan, A.L.C. and Verhoeven, J. (1990) Duration, Pitch and Diphones in the CSTR TTS System. Proceedings of the International Conference on Spoken Language Processing, Kobe, 1 January 1990, 825-828.
[4] Tritoasmoro, I.I. (2006) Text-to-Speech Bahasa Indonesia Menggunakan Concatenation Synthesizer Berbasis Fonem. Seminar Nasional Sistem dan Informatika, Bali, 17 November 2006, 171-176.
[5] Laksman, M. (1995) Realisasi Tekanan Kata dalam Bahasa Indonesia. PELLBA 8, pages 179{215. Lembaga Bahasa Unika Atmajaya, Jakarta, 1995.
[6] Halim, A. (1975) Intonation in Relation to Syntax in Bahasa Indonesia. Proyek Pengembangan Bahasa dan Sastra Indonesia dan Daerah, Departemen Pendidikan dan Kebudayaan.
[7] van Lieshout, P.P. (2003) PRAAT Short Tutorial. University of Toronto, Graduate Department of Speech-Language Pathology, Faculty of Medicine, Oral Dynamics Lab (ODL), Toronto.
[8] Heuven, V.J.V. and Zanten, E.V. (2007) Prosody in Indonesian Languages. LOT, Utrecht.
[9] Sakri, A. (1994) Bangun Kalimat Bahasa Indonesia. 2nd Edition, Penerbit ITB, Bandung.
[10] McCune, K.M. (1985) The Internal Structure of Indonesian Roots. Number v. 2 in the Internal Structure of Indonesian Roots. Badan Penyelenggara Seri Nusa, Universitas Katolik Indonesia Atma Jaya, Jakarta.
[11] Vamarasi, M.K. (1986) Grammatical Relations in Bahasa Indonesia. Cornell University, Ithaca.
[12] Mbrola, T. (2009) The MBROLA Home Page. http://tcts.fpms.ac.be/synthesis/mbrola/
[13] Boril, H. and Pollák, P. (2004) Direct Time Domain Fundamental Frequency Estimation of Speech in Noisy Conditions. EUSIPCO 2004, 2, 1003-1006.
[14] ITU (2001) ITU-T Recommendation P.862: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs. Technical Report, ITU.
[15] Rix, A.W., Beerends, J.G., Hollier, M.P. and Hekstra, A.P. (2001) Perceptual Evaluation of Speech Quality (PESQ)— A New Method for Speech Quality Assessment of Telephone Networks and Codecs. IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, 7-11 May 2001, 749-752.

  
comments powered by Disqus
JSIP Subscription
E-Mail Alert
JSIP Most popular papers
Publication Ethics & OA Statement
JSIP News
Frequently Asked Questions
Recommend to Peers
Recommend to Library
Contact Us

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.