A Neural Vocoder With Hierarchical Generation Of Amplitude And Phase Spectra For Statistical Parametric Speech Synthesis
2019 Β· Yang Ai, Zhen-Hua Ling
Abstract
This paper presents a neural vocoder named HiNet which reconstructs speech waveforms from acoustic features by predicting amplitude and phase spectra hierarchically. Different from existing neural vocoders such as WaveNet, SampleRNN and WaveRNN which directly generate waveform samples using single neural networks, the HiNet vocoder is composed of an amplitude spectrum predictor (ASP) and a phase spectrum predictor (PSP). The ASP is a simple DNN model which predicts log amplitude spectra (LAS) from acoustic features. The predicted LAS are sent into the PSP for phase recovery. Considering the issue of phase warping and the difficulty of phase modeling, the PSP is constructed by concatenating a neural source-filter (NSF) waveform generator with a phase extractor. We also introduce generative adversarial networks (GANs) into both ASP and PSP. Finally, the outputs of ASP and PSP are combined to reconstruct speech waveforms by short-time Fourier synthesis. Since there are no autoregressive s
Authors
(none)
Tags
Stats
Related papers
- Knowledge-and-data-driven Amplitude Spectrum Prediction For Hierarchical Neural Vocoders (2020)5.24
- Apnet: An All-frame-level Neural Vocoder Incorporating Direct Prediction Of Amplitude And Phase Spectra (2023)9.59
- A Neural Denoising Vocoder For Clean Waveform Generation From Noisy Mel-spectrogram Based On Amplitude And Phase Predictions (2024)0.00
- Apnet2: High-quality And High-efficiency Neural Vocoder With Direct Prediction Of Amplitude And Phase Spectra (2023)6.34
- Hifi-wavegan: Generative Adversarial Network With Auxiliary Spectrogram-phase Loss For High-fidelity Singing Voice Generation (2022)0.00
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26
- A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis (2018)11.76