Knowledge-and-data-driven Amplitude Spectrum Prediction For Hierarchical Neural Vocoders
2020 Β· Yang Ai, Zhen-Hua Ling
Abstract
In our previous work, we have proposed a neural vocoder called HiNet which recovers speech waveforms by predicting amplitude and phase spectra hierarchically from input acoustic features. In HiNet, the amplitude spectrum predictor (ASP) predicts log amplitude spectra (LAS) from input acoustic features. This paper proposes a novel knowledge-and-data-driven ASP (KDD-ASP) to improve the conventional one. First, acoustic features (i.e., F0 and mel-cepstra) pass through a knowledge-driven LAS recovery module to obtain approximate LAS (ALAS). This module is designed based on the combination of STFT and source-filter theory, in which the source part and the filter part are designed based on input F0 and mel-cepstra, respectively. Then, the recovered ALAS are processed by a data-driven LAS refinement module which consists of multiple trainable convolutional layers to get the final LAS. Experimental results show that the HiNet vocoder using KDD-ASP can achieve higher quality of synthetic speech
Authors
(none)
Tags
Stats
Related papers
- A Neural Vocoder With Hierarchical Generation Of Amplitude And Phase Spectra For Statistical Parametric Speech Synthesis (2019)10.74
- Apnet: An All-frame-level Neural Vocoder Incorporating Direct Prediction Of Amplitude And Phase Spectra (2023)9.59
- Apnet2: High-quality And High-efficiency Neural Vocoder With Direct Prediction Of Amplitude And Phase Spectra (2023)6.34
- A Neural Denoising Vocoder For Clean Waveform Generation From Noisy Mel-spectrogram Based On Amplitude And Phase Predictions (2024)0.00
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- La-voce: Low-snr Audio-visual Speech Enhancement Using Neural Vocoders (2022)0.00
- Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis (2024)5.24
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26