Expediting TTS Synthesis With Adversarial Vocoding
2019 Β· Paarth Neekhara, Chris Donahue, Miller Puckette, et al.
Abstract
Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms na\"ive vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.
Authors
(none)
Tags
Stats
Related papers
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- End-to-end Video-to-speech Synthesis Using Generative Adversarial Networks (2021)11.58
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Multi-spectrogan: High-diversity And High-fidelity Spectrogram Generation With Adversarial Style Combination For Speech Synthesis (2020)0.00
- Generative Adversarial Training For Text-to-speech Synthesis Based On Raw Phonetic Input And Explicit Prosody Modelling (2023)3.58
- DSPGAN: A Gan-based Universal Vocoder For High-fidelity TTS By Time-frequency Domain Supervision From DSP (2022)9.03