Source-filter-based Generative Adversarial Neural Vocoder For High Fidelity Speech Synthesis
2023 Β· Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
Abstract
This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework. The SF-GAN vocoder is composed of a source module and a resolution-wise conditional filter module and is trained based on generative adversarial strategies. The source module produces an excitation signal from the F0 information, then the resolution-wise convolutional filter module combines the excitation signal with processed acoustic features at various temporal resolutions and finally reconstructs the raw waveform. The experimental results show that our proposed SF-GAN vocoder outperforms the state-of-the-art HiFi-GAN and Fre-GAN in both analysis-synthesis (AS) and text-to-speech (TTS) tasks, and the synthesized speech quality of SF-GAN is comparable to the ground-truth audio.
Authors
(none)
Tags
Stats
Related papers
- Source-filter Hifi-gan: Fast And Pitch Controllable High-fidelity Neural Vocoder (2022)10.74
- Unified Source-filter GAN: Unified Source-filter Network Based On Factorization Of Quasi-periodic Parallel Wavegan (2021)7.81
- Unified Source-filter GAN With Harmonic-plus-noise Source Excitation Generation (2022)0.00
- Hifi-wavegan: Generative Adversarial Network With Auxiliary Spectrogram-phase Loss For High-fidelity Singing Voice Generation (2022)0.00
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Refinegan: Universally Generating Waveform Better Than Ground Truth With Highly Accurate Pitch And Intensity Responses (2021)6.77