Lvcnet: Efficient Condition-dependent Modeling Network For Waveform Generation
2021 Β· Zhen Zeng, Jianzong Wang, Ning Cheng, et al.
Abstract
In this paper, we propose a novel conditional convolution network, named location-variable convolution, to model the dependencies of the waveform sequence. Different from the use of unified convolution kernels in WaveNet to capture the dependencies of arbitrary waveform, the location-variable convolution uses convolution kernels with different coefficients to perform convolution operations on different waveform intervals, where the coefficients of kernels is predicted according to conditioning acoustic features, such as Mel-spectrograms. Based on location-variable convolutions, we design LVCNet for waveform generation, and apply it in Parallel WaveGAN to design more efficient vocoder. Experiments on the LJSpeech dataset show that our proposed model achieves a four-fold increase in synthesis speed compared to the original Parallel WaveGAN without any degradation in sound quality, which verifies the effectiveness of location-variable convolutions.
Authors
(none)
Tags
Stats
Related papers
- Parallel Wavenet Conditioned On VAE Latent Vectors (2020)0.00
- Lp-wavenet: Linear Prediction-based Wavenet Speech Synthesis (2018)0.00
- Learning Waveform-based Acoustic Models Using Deep Variational Convolutional Neural Networks (2019)6.77
- Parallel Waveform Synthesis Based On Generative Adversarial Networks With Voicing-aware Conditional Discriminators (2020)7.16
- GELP: Gan-excited Linear Prediction For Speech Synthesis From Mel-spectrogram (2019)10.74
- Featherwave: An Efficient High-fidelity Neural Vocoder With Multi-band Linear Prediction (2020)8.35
- High-fidelity And Low-latency Universal Neural Vocoder Based On Multiband Wavernn With Data-driven Linear Prediction For Discrete Waveform Modeling (2021)6.77
- A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis (2018)11.76