Refinegan: Universally Generating Waveform Better Than Ground Truth With Highly Accurate Pitch And Intensity Responses
2021 Β· Shengyuan Xu, Wenxiao Zhao, Jing Guo
Abstract
Most GAN(Generative Adversarial Network)-based approaches towards high-fidelity waveform generation heavily rely on discriminators to improve their performance. However, GAN methods introduce much uncertainty into the generation process and often result in mismatches of pitch and intensity, which is fatal when it comes to sensitive use cases such as singing voice synthesis(SVS). To address this problem, we propose RefineGAN, a high-fidelity neural vocoder focused on the robustness, pitch and intensity accuracy, and high-speed full-band audio generation. We applyed a pitch-guided refine architecture with a multi-scale spectrogram-based loss function to help stabilize the training process and maintain the robustness of the neural vocoder while using the GAN-based training method. Audio generated using this method shows a better performance in subjective tests when compared with the ground-truth audio. This result shows that the fidelity is even improved during the waveform reconstruction
Authors
(none)
Tags
Stats
Related papers
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Hifi-wavegan: Generative Adversarial Network With Auxiliary Spectrogram-phase Loss For High-fidelity Singing Voice Generation (2022)0.00
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- Improving Adversarial Waveform Generation Based Singing Voice Conversion With Harmonic Signals (2022)7.50
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Gansynth: Adversarial Neural Audio Synthesis (2019)0.00
- Source-filter-based Generative Adversarial Neural Vocoder For High Fidelity Speech Synthesis (2023)3.58
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00