Generative Adversarial Training For Text-to-speech Synthesis Based On Raw Phonetic Input And Explicit Prosody Modelling
2023 Β· Tiberiu Boros, Stefan Daniel Dumitrescu, Ionut Mironica, et al.
Abstract
We describe an end-to-end speech synthesis system that uses generative adversarial training. We train our Vocoder for raw phoneme-to-audio conversion, using explicit phonetic, pitch and duration modeling. We experiment with several pre-trained models for contextualized and decontextualized word embeddings and we introduce a new method for highly expressive character voice matching, based on discreet style tokens.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Adversarial Text-to-speech (2020)0.00
- End-to-end Video-to-speech Synthesis Using Generative Adversarial Networks (2021)11.58
- Expediting TTS Synthesis With Adversarial Vocoding (2019)6.77
- Conditional Variational Autoencoder With Adversarial Learning For End-to-end Text-to-speech (2021)0.00
- Generative Adversarial Network-based Glottal Waveform Model For Statistical Parametric Speech Synthesis (2019)10.35
- Adversarial Learning Of Intermediate Acoustic Feature For End-to-end Lightweight Text-to-speech (2022)0.00
- Delightfultts 2: End-to-end Speech Synthesis With Adversarial Vector-quantized Auto-encoders (2022)9.23
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00