MSR-NV: Neural Vocoder Using Multiple Sampling Rates
2021 Β· Kentaro Mitsui, Kei Sawada
Abstract
The development of neural vocoders (NVs) has resulted in the high-quality and fast generation of waveforms. However, conventional NVs target a single sampling rate and require re-training when applied to different sampling rates. A suitable sampling rate varies from application to application due to the trade-off between speech quality and generation speed. In this study, we propose a method to handle multiple sampling rates in a single NV, called the MSR-NV. By generating waveforms step-by-step starting from a low sampling rate, MSR-NV can efficiently learn the characteristics of each frequency band and synthesize high-quality speech at multiple sampling rates. It can be regarded as an extension of the previously proposed NVs, and in this study, we extend the structure of Parallel WaveGAN (PWG). Experimental evaluation results demonstrate that the proposed method achieves remarkably higher subjective quality than the original PWG trained separately at 16, 24, and 48 kHz, without incre
Authors
(none)
Tags
Stats
Related papers
- Neural Vocoder Is All You Need For Speech Super-resolution (2022)12.25
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- A Real-time Wideband Neural Vocoder At 1.6 Kb/s Using Lpcnet (2019)12.61
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Featherwave: An Efficient High-fidelity Neural Vocoder With Multi-band Linear Prediction (2020)8.35
- Neuraldps: Neural Deterministic Plus Stochastic Model With Multiband Excitation For Noise-controllable Waveform Generation (2022)0.00
- Universal Melgan: A Robust Neural Vocoder For High-fidelity Waveform Generation In Multiple Domains (2020)0.00