Speaker Consistency Loss And Step-wise Optimization For Semi-supervised Joint Training Of TTS And ASR Using Unpaired Text Data
2022 Β· Naoki Makishima, Satoshi Suzuki, Atsushi Ando, et al.
Abstract
In this paper, we investigate the semi-supervised joint training of text to speech (TTS) and automatic speech recognition (ASR), where a small amount of paired data and a large amount of unpaired text data are available. Conventional studies form a cycle called the TTS-ASR pipeline, where the multispeaker TTS model synthesizes speech from text with a reference speech and the ASR model reconstructs the text from the synthesized speech, after which both models are trained with a cycle-consistency loss. However, the synthesized speech does not reflect the speaker characteristics of the reference speech and the synthesized speech becomes overly easy for the ASR model to recognize after training. This not only decreases the TTS model quality but also limits the ASR model improvement. To solve this problem, we propose improving the cycleconsistency-based training with a speaker consistency loss and step-wise optimization. The speaker consistency loss brings the speaker characteristics of the
Authors
(none)
Tags
Stats
Related papers
- Semi-supervised Sequence-to-sequence ASR Using Unpaired Speech And Text (2019)0.00
- Cycle-consistency Training For End-to-end Speech Recognition (2018)0.00
- Improved Consistency Training For Semi-supervised Sequence-to-sequence ASR Via Speech Chain Reconstruction And Self-transcribing (2022)0.00
- Adversarial Speaker-consistency Learning Using Untranscribed Speech Data For Zero-shot Multi-speaker Text-to-speech (2022)4.52
- Continual Speaker Adaptation For Text-to-speech Synthesis (2021)0.00
- Semi-supervised Learning For Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation (2020)5.24
- Speaker Verification-derived Loss And Data Augmentation For Dnn-based Multispeaker Speech Synthesis (2021)3.58
- Almost Unsupervised Text To Speech And Automatic Speech Recognition (2019)0.00