Joint Speech Transcription And Translation: Pseudo-labeling With Out-of-distribution Data
2022 Β· Mozhdeh Gheini, Tatiana Likhomanenko, Matthias Sperber, et al.
Abstract
Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing as such results in additional gains on top of the vanilla pseudo-labeling setup resulting in total improvements of up to 0.6% absolute WER and 2.2 BLEU points.
Authors
(none)
Tags
Stats
Related papers
- Self-training For End-to-end Speech Recognition (2019)15.48
- Leveraging Pseudo-labeled Data To Improve Direct Speech-to-speech Translation (2022)10.33
- Alternative Pseudo-labeling For Semi-supervised Automatic Speech Recognition (2023)10.48
- Empowering Low-resource Language ASR Via Large-scale Pseudo Labeling (2024)3.58
- Self-training And Pre-training Are Complementary For Speech Recognition (2020)14.15
- Boosting Active Learning For Speech Recognition With Noisy Pseudo-labeled Samples (2020)0.00
- Cross-lingual Knowledge Transfer And Iterative Pseudo-labeling For Low-resource Speech Recognition With Transducers (2023)0.00
- Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask (2021)5.84