Semi-supervised Sequence-to-sequence ASR Using Unpaired Speech And Text
2019 Β· Murali Karthick Baskar, Shinji Watanabe, Ramon Astudillo, et al.
Abstract
Sequence-to-sequence automatic speech recognition (ASR) models require large quantities of data to attain high performance. For this reason, there has been a recent surge in interest for unsupervised and semi-supervised training in such models. This work builds upon recent results showing notable improvements in semi-supervised training using cycle-consistency and related techniques. Such techniques derive training procedures and losses able to leverage unpaired speech and/or text data by combining ASR with Text-to-Speech (TTS) models. In particular, this work proposes a new semi-supervised loss combining an end-to-end differentiable ASR\(\rightarrow\)TTS loss with TTS\(\rightarrow\)ASR loss. The method is able to leverage both unpaired speech and text data to outperform recently proposed related techniques in terms of %WER. We provide extensive results analyzing the impact of data quantity and speech and text modalities and show consistent gains across WSJ and Librispeech corpora. Our
Authors
(none)
Tags
Stats
Related papers
- Speaker Consistency Loss And Step-wise Optimization For Semi-supervised Joint Training Of TTS And ASR Using Unpaired Text Data (2022)0.00
- Almost Unsupervised Text To Speech And Automatic Speech Recognition (2019)0.00
- End-to-end ASR: From Supervised To Semi-supervised Learning With Modern Architectures (2019)0.00
- A Comparison Of Semi-supervised Learning Techniques For Streaming ASR At Scale (2023)2.26
- Cycle-consistency Training For End-to-end Speech Recognition (2018)0.00
- Semi-supervised Training For Improving Data Efficiency In End-to-end Speech Synthesis (2018)13.28
- Improving Semi-supervised End-to-end Automatic Speech Recognition Using Cyclegan And Inter-domain Losses (2022)3.58
- A General Multi-task Learning Framework To Leverage Text Data For Speech To Text Tasks (2020)11.67