Journal of Computer and Communications

Volume 13, Issue 9 (September 2025)

ISSN Print: 2327-5219   ISSN Online: 2327-5227

Google-based Impact Factor: 1.98  Citations  

A Study and Practice of Singing Voice Conversion Based on E-SVS and R-SVC

  XML Download Download as PDF (Size: 879KB)  PP. 42-54  
DOI: 10.4236/jcc.2025.139003    36 Downloads   386 Views  

ABSTRACT

Aiming at the common issues of poor sound quality and significant artifacts involved in today’s AI singing voice conversion techniques, this paper proposes a new method of AI-driven singing voice conversion coupled with the development of a deployable system. First, an Expert-based Singing Voice Separation (E-SVS) model based on UVR5 was established to achieve high-fidelity vocal extraction, dereverberation, and denoising by cascading MDX-Net and VR Architecture models. Then, a Retrieval-based Singing Voice Conversion (R-SVC) model is constructed as the core conversion engine. Utilizing HuBERT to extract content features while performing efficient timbre feature retrieval via Faiss, the R-SVC model generated cover audio with highly similar timbre and accurate melody. Finally, by designing a task queue mechanism, WeChat Mini Program front-end and asynchronous processing back-end software is developed to enable providing a smooth user experience, capable of resolving lag issues associated with computationally intensive AI tasks. In practice, it was found that this system can train high-quality models with customized vocals at relatively low data and time costs (10 - 30 minutes of audio).

Share and Cite:

Dong, F. (2025) A Study and Practice of Singing Voice Conversion Based on E-SVS and R-SVC. Journal of Computer and Communications, 13, 42-54. doi: 10.4236/jcc.2025.139003.

Cited by

No relevant information.

Copyright © 2026 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.