Machine Speech Chain With One-shot Speaker Adaptation
2018 Β· Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Abstract
In previous work, we developed a closed-loop speech chain model based on deep learning, in which the architecture enabled the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components to mutually improve their performance. This was accomplished by the two parts teaching each other using both labeled and unlabeled data. This approach could significantly improve model performance within a single-speaker speech dataset, but only a slight increase could be gained in multi-speaker tasks. Furthermore, the model is still unable to handle unseen speakers. In this paper, we present a new speech chain mechanism by integrating a speaker recognition model inside the loop. We also propose extending the capability of TTS to handle unseen speakers by implementing one-shot speaker adaptation. This enables TTS to mimic voice characteristics from one speaker to another with only a one-shot speaker sample, even from a text without any speaker information. In the speech chain loop m
Authors
(none)
Tags
Stats
Related papers
- Exploring Machine Speech Chain For Domain Adaptation And Few-shot Speaker Adaptation (2021)0.00
- Listening While Speaking And Visualizing: Improving ASR Through Multimodal Chain (2019)4.52
- Tokenchain: A Discrete Speech Chain Via Semantic Token Modeling (2025)0.00
- Multimodal Speech Synthesis Architecture For Unsupervised Speaker Adaptation (2018)6.34
- Meta-tts: Meta-learning For Few-shot Speaker Adaptive Text-to-speech (2021)12.74
- A Unified Speaker Adaptation Method For Speech Synthesis Using Transcribed And Untranscribed Speech With Backpropagation (2019)0.00
- Augmenting Images For ASR And TTS Through Single-loop And Dual-loop Multimodal Chain Framework (2020)3.58
- ASRRL-TTS: Agile Speaker Representation Reinforcement Learning For Text-to-speech Speaker Adaptation (2024)0.00