Apnet2: High-quality And High-efficiency Neural Vocoder With Direct Prediction Of Amplitude And Phase Spectra
2023 Β· Hui-Peng Du, Ye-Xin Lu, Yang Ai, et al.
Abstract
In our previous work, we proposed a neural vocoder called APNet, which directly predicts speech amplitude and phase spectra with a 5 ms frame shift in parallel from the input acoustic features, and then reconstructs the 16 kHz speech waveform using inverse short-time Fourier transform (ISTFT). APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed. However, the performance of the APNet vocoder is constrained by the waveform sampling rate and spectral frame shift, limiting its practicality for high-quality speech synthesis. Therefore, this paper proposes an improved iteration of APNet, named APNet2. The proposed APNet2 vocoder adopts ConvNeXt v2 as the backbone network for amplitude and phase predictions, expecting to enhance the modeling capability. Additionally, we introduce a multi-resolution discriminator (MRD) into the GAN-based losses and optimize the form of certain losses. At
Authors
(none)
Tags
Stats
Related papers
- Apnet: An All-frame-level Neural Vocoder Incorporating Direct Prediction Of Amplitude And Phase Spectra (2023)9.59
- A Neural Denoising Vocoder For Clean Waveform Generation From Noisy Mel-spectrogram Based On Amplitude And Phase Predictions (2024)0.00
- A Neural Vocoder With Hierarchical Generation Of Amplitude And Phase Spectra For Statistical Parametric Speech Synthesis (2019)10.74
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- Towards High-quality And Efficient Speech Bandwidth Extension With Parallel Amplitude And Phase Prediction (2024)0.00
- Knowledge-and-data-driven Amplitude Spectrum Prediction For Hierarchical Neural Vocoders (2020)5.24
- A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis (2018)11.76