Robust Neural Machine Translation For Clean And Noisy Speech Transcripts
2019 Β· Mattia Antonino di Gangi, Robert Enyedi, Alessandra Brusadin, et al.
Abstract
Neural machine translation models have shown to achieve high quality when trained and fed with well structured and punctuated input texts. Unfortunately, the latter condition is not met in spoken language translation, where the input is generated by an automatic speech recognition (ASR) system. In this paper, we study how to adapt a strong NMT system to make it robust to typical ASR errors. As in our application scenarios transcripts might be post-edited by human experts, we propose adaptation strategies to train a single system that can translate either clean or noisy input with no supervision on the input type. Our experimental results on a public speech translation data set show that adapting a model on a significant amount of parallel data including ASR transcripts is beneficial with test data of the same type, but produces a small degradation when translating clean text. Adapting on both clean and noisy variants of the same data leads to the best results on both input types.
Authors
(none)
Tags
Stats
Related papers
- Low-latency Neural Speech Translation (2018)9.03
- Assessing The Tolerance Of Neural Machine Translation Systems Against Speech Recognition Errors (2019)2.26
- Sentence Boundary Augmentation For Neural Machine Translation Robustness (2020)4.52
- Performance Improvements Of Probabilistic Transcript-adapted ASR With Recurrent Neural Network And Language-specific Constraints (2016)0.00
- Improving Robustness Of Neural Inverse Text Normalization Via Data-augmentation, Semi-supervised Learning, And Post-aligning Method (2023)0.00
- Generating Human Readable Transcript For Automatic Speech Recognition With Pre-trained Language Model (2021)0.00
- Noise Robust TTS For Low Resource Speakers Using Pre-trained Model And Speech Enhancement (2020)0.00
- Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition (2019)5.24