A Neural Denoising Vocoder For Clean Waveform Generation From Noisy Mel-spectrogram Based On Amplitude And Phase Predictions
2024 Β· Hui-Peng Du, Ye-Xin Lu, Yang Ai, et al.
Abstract
This paper proposes a novel neural denoising vocoder that can generate clean speech waveforms from noisy mel-spectrograms. The proposed neural denoising vocoder consists of two components, i.e., a spectrum predictor and a enhancement module. The spectrum predictor first predicts the noisy amplitude and phase spectra from the input noisy mel-spectrogram, and subsequently the enhancement module recovers the clean amplitude and phase spectrum from noisy ones. Finally, clean speech waveforms are reconstructed through inverse short-time Fourier transform (iSTFT). All operations are performed at the frame-level spectral domain, with the APNet vocoder and MP-SENet speech enhancement model used as the backbones for the two components, respectively. Experimental results demonstrate that our proposed neural denoising vocoder achieves state-of-the-art performance compared to existing neural vocoders on the VoiceBank+DEMAND dataset. Additionally, despite the lack of phase information and partial a
Authors
(none)
Tags
Stats
Related papers
- Apnet: An All-frame-level Neural Vocoder Incorporating Direct Prediction Of Amplitude And Phase Spectra (2023)9.59
- Apnet2: High-quality And High-efficiency Neural Vocoder With Direct Prediction Of Amplitude And Phase Spectra (2023)6.34
- A Neural Vocoder With Hierarchical Generation Of Amplitude And Phase Spectra For Statistical Parametric Speech Synthesis (2019)10.74
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- Speech Denoising By Parametric Resynthesis (2019)7.16
- Estvocoder: An Excitation-spectral-transformed Neural Vocoder Conditioned On Mel Spectrogram (2024)0.00
- Parametric Resynthesis With Neural Vocoders (2019)8.60
- Enhancing Low-quality Voice Recordings Using Disentangled Channel Factor And Neural Waveform Model (2020)0.00