TITLE:
Analysis of Leveraging Fastspeech 2 and Hifi-Gan Models for Speech Synthesis Adapted for Nigerian Languages
AUTHORS:
Emmanuel Nwabueze Ekwonwune, Leticia E. Elebiri, Abraham Oghenemega Ovwonuri, Donpatrick Onwusiribe Uzondu, Chinonso Daniel Okoronkwo, Igbokwe Benson Ikechukwu, Dennis Mary Chinonye
KEYWORDS:
Languages, Natural Language Processing, HiFi-GAN, Models, FastSpeech2, Speech Synthesis, Meta-TTS, Deep Learning
JOURNAL NAME:
Intelligent Information Management,
Vol.17 No.6,
November
28,
2025
ABSTRACT: The aim of this research is to develop a speech synthesis model tailored towards Nigerian languages by leveraging natural language processing tool such as FastSpeech 2 and meta-tts for high-quality, non-autoregressive text-to-speech (TTS) generation and HiFi-GAN for neural vocoding. It was motivated due to lack of high-quality synthetic speech models for low-resource languages especially Nigerian Languages and specificially Hausa, Igbo and Yoruba. The methodology adopted is a structured and iterative approach that integrates Structured System Analysis and Design Methodology (SSADM), and Machine Learning Development Lifecycle (MLDLC), which incorporates a feasibility study, corpus collection, phonetic analysis, and model training with Nigerian Language annotated speech dataset. Speech dataset will be collected and preprocess from selected Nigerian languages(Igbo Hausa and Yoruba). Phonemes will be developed using both rule-based approach and grapheme-to-phoneme models like Epitran and Phonemizer, while FastSpeech 2, meta-tts and HiFi-GAN will be fine-tuned to accommodate tonal variations and prosodic patterns inherent in these languages. The model training pipeline will integrate Tacotron-based aligners for efficient text-to-mel-spectrogram conversion, while HiFi-GAN will enhance naturalness and intelligibility. Python will be the primary programming language for the implementation of this research while the interface will be a combination of hypertext markup language (HTML), cascading style sheet (CSS) and Javascript. The expected outcome is a state-of-the-art, speech synthesis system capable of generating natural and intelligible speech across multiple Nigerian languages.