The Zero Resource Speech Challenge 2019: TTS Without T
2019 Β· Ewan Dunbar, Robin Algayres, Julien Karadayi, et al.
Abstract
We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery dataset) and align them to the voice recordings in a way that works best for the purpose of synthesizing novel utterances from novel speakers, similar to the target speaker's voice. We describe the metrics used for evaluation, a baseline system consisting of unsupervised subword unit discovery plus a standard TTS system, and a topline TTS using gold phoneme transcriptions. We present an overview of the 19 submitted systems from 10 teams and discuss the main results.
Authors
(none)
Tags
Stats
Related papers
- The Zero Resource Speech Challenge 2020: Discovering Discrete Subword And Word Units (2020)11.58
- Exploring TTS Without T Using Biologically/psychologically Motivated Neural Network Modules (zerospeech 2020) (2020)6.34
- Exploration Of End-to-end Synthesisers Forzero Resource Speech Challenge 2020 (2020)4.52
- Transformer VQ-VAE For Unsupervised Unit Discovery And Speech Synthesis: Zerospeech 2020 Challenge (2020)9.41
- Self-supervised Language Learning From Raw Audio: Lessons From The Zero Resource Speech Challenge (2022)10.07
- Yourtts: Towards Zero-shot Multi-speaker TTS And Zero-shot Voice Conversion For Everyone (2021)0.00
- Learning To Speak From Text: Zero-shot Multilingual Text-to-speech With Unsupervised Text Pretraining (2023)8.82
- ZMM-TTS: Zero-shot Multilingual And Multispeaker Speech Synthesis Conditioned On Self-supervised Discrete Speech Representations (2023)10.35