Adversarially Trained End-to-end Korean Singing Voice Synthesis System
2019 Β· Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, et al.
Abstract
In this paper, we propose an end-to-end Korean singing voice synthesis system from lyrics and a symbolic melody using the following three novel approaches: 1) phonetic enhancement masking, 2) local conditioning of text and pitch to the super-resolution network, and 3) conditional adversarial training. The proposed system consists of two main modules; a mel-synthesis network that generates a mel-spectrogram from the given input information, and a super-resolution network that upsamples the generated mel-spectrogram into a linear-spectrogram. In the mel-synthesis network, phonetic enhancement masking is applied to generate implicit formant masks solely from the input text, which enables a more accurate phonetic control of singing voice. In addition, we show that two other proposed methods -- local conditioning of text and pitch, and conditional adversarial training -- are crucial for a realistic generation of the human singing voice in the super-resolution process. Finally, both quantita
Authors
(none)
Tags
Stats
Related papers
- N-singer: A Non-autoregressive Korean Singing Voice Synthesis System For Pronunciation Enhancement (2021)8.60
- A Melody-unsupervision Model For Singing Voice Synthesis (2021)5.84
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- Adversarial Multi-task Learning For Disentangling Timbre And Pitch In Singing Voice Synthesis (2022)4.52
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)10.61
- Visinger: Variational Inference With Adversarial Learning For End-to-end Singing Voice Synthesis (2021)12.99
- Phonetic Posteriorgrams Based Many-to-many Singing Voice Conversion Via Adversarial Training (2020)0.00
- Xiaoicesing 2: A High-fidelity Singing Voice Synthesizer Based On Generative Adversarial Network (2022)0.00