Telephonetic: Making Neural Language Models Robust To ASR And Semantic Noise
2019 Β· Chris Larson, Tarek Lahlou, Diana Mingels, et al.
Abstract
Speech processing systems rely on robust feature extraction to handle phonetic and semantic variations found in natural language. While techniques exist for desensitizing features to common noise patterns produced by Speech-to-Text (STT) and Text-to-Speech (TTS) systems, the question remains how to best leverage state-of-the-art language models (which capture rich semantic features, but are trained on only written text) on inputs with ASR errors. In this paper, we present Telephonetic, a data augmentation framework that helps robustify language model features to ASR corrupted inputs. To capture phonetic alterations, we employ a character-level language model trained using probabilistic masking. Phonetic augmentations are generated in two stages: a TTS encoder (Tacotron 2, WaveGlow) and a STT decoder (DeepSpeech). Similarly, semantic perturbations are produced by sampling from nearby words in an embedding space, which is computed using the BERT language model. Words are selected for aug
Authors
(none)
Tags
Stats
Related papers
- Improving Robustness Of Neural Inverse Text Normalization Via Data-augmentation, Semi-supervised Learning, And Post-aligning Method (2023)0.00
- On The Effectiveness Of Neural Text Generation Based Data Augmentation For Recognition Of Morphologically Rich Speech (2020)0.00
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- Noise Robust TTS For Low Resource Speakers Using Pre-trained Model And Speech Enhancement (2020)0.00
- Effective Decoder Masking For Transformer Based End-to-end Speech Recognition (2020)0.00
- Semantic Mask For Transformer Based End-to-end Speech Recognition (2019)9.41
- Phaseperturbation: Speech Data Augmentation Via Phase Perturbation For Automatic Speech Recognition (2023)0.00
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50