Xiaoicesing 2: A High-fidelity Singing Voice Synthesizer Based On Generative Adversarial Network
2022 Β· Chunhui Wang, Chang Zeng, Xing He
Abstract
XiaoiceSing is a singing voice synthesis (SVS) system that aims at generating 48kHz singing voices. However, the mel-spectrogram generated by it is over-smoothing in middle- and high-frequency areas due to no special design for modeling the details of these parts. In this paper, we propose XiaoiceSing2, which can generate the details of middle- and high-frequency parts to better construct the full-band mel-spectrogram. Specifically, in order to alleviate this problem, XiaoiceSing2 adopts a generative adversarial network (GAN), which consists of a FastSpeech-based generator and a multi-band discriminator. We improve the feed-forward Transformer (FFT) block by adding multiple residual convolutional blocks in parallel with the self-attention block to balance the local and global features. The multi-band discriminator contains three sub-discriminators responsible for low-, middle-, and high-frequency parts of the mel-spectrogram, respectively. Each sub-discriminator is composed of several
Authors
(none)
Tags
Stats
Related papers
- Xiaoicesing: A High-quality And Integrated Singing Voice Synthesis System (2020)12.54
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)10.61
- Hifi-wavegan: Generative Adversarial Network With Auxiliary Spectrogram-phase Loss For High-fidelity Singing Voice Generation (2022)0.00
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- Ddsp-based Singing Vocoders: A New Subtractive-based Synthesizer And A Comprehensive Evaluation (2022)0.00
- Instructsing: High-fidelity Singing Voice Generation Via Instructing Yourself (2024)0.00
- Wgansing: A Multi-voice Singing Voice Synthesizer Based On The Wasserstein-gan (2019)11.08
- Mandarin Singing Voice Synthesis With Denoising Diffusion Probabilistic Wasserstein GAN (2022)6.34