Fast, High-quality And Parameter-efficient Articulatory Synthesis Using Differentiable DSP
2024 Β· Yisi Liu, Bohan Yu, Drake Lin, et al.
Abstract
Articulatory trajectories like electromagnetic articulography (EMA) provide a low-dimensional representation of the vocal tract filter and have been used as natural, grounded features for speech synthesis. Differentiable digital signal processing (DDSP) is a parameter-efficient framework for audio synthesis. Therefore, integrating low-dimensional EMA features with DDSP can significantly enhance the computational efficiency of speech synthesis. In this paper, we propose a fast, high-quality, and parameter-efficient DDSP articulatory vocoder that can synthesize speech from EMA, F0, and loudness. We incorporate several techniques to solve the harmonics / noise imbalance problem, and add a multi-resolution adversarial loss for better synthesis quality. Our model achieves a transcription word error rate (WER) of 6.67% and a mean opinion score (MOS) of 3.74, with an improvement of 1.63% and 0.16 compared to the state-of-the-art (SOTA) baseline. Our DDSP vocoder is 4.9x faster than the baseli
Authors
(none)
Tags
Stats
Related papers
- Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis (2024)5.24
- Speech Synthesis And Control Using Differentiable DSP (2020)0.00
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)14.35
- Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-to-end Audio Style Transfer (2022)0.00
- DENT-DDSP: Data-efficient Noisy Speech Generator Using Differentiable Digital Signal Processors For Explicit Distortion Modelling And Noise-robust Speech Recognition (2022)0.00
- Towards Parametric Speech Synthesis Using Gaussian-markov Model Of Spectral Envelope And Wavelet-based Decomposition Of F0 (2022)0.00
- BDDM: Bilateral Denoising Diffusion Models For Fast And High-quality Speech Synthesis (2022)4.76
- DSPGAN: A Gan-based Universal Vocoder For High-fidelity TTS By Time-frequency Domain Supervision From DSP (2022)9.03