Improving Robustness Of Neural Inverse Text Normalization Via Data-augmentation, Semi-supervised Learning, And Post-aligning Method
2023 Β· Juntae Kim, Minkyu Lim, Seokjin Hong
Abstract
Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output spoken-form, highlighting the necessity for robust ITN in product-level ASR-based applications. Although neural ITN methods have shown promise, they still encounter performance challenges, particularly when dealing with ASR-generated spoken text. These challenges arise from the out-of-domain problem between training data and ASR-generated text. To address this, we propose a direct training approach that utilizes ASR-generated written or spoken text, with pairs augmented through ASR linguistic context emulation and a semi-supervised learning method enhanced by a large language model, respectively. Additionally, we introduce a post-aligning method to manage unpredictable errors, thereby enhancing the reliability of ITN. Our experiments show that our propo
Authors
(none)
Tags
Stats
Related papers
- Almost Unsupervised Text To Speech And Automatic Speech Recognition (2019)0.00
- Telephonetic: Making Neural Language Models Robust To ASR And Semantic Noise (2019)0.00
- Robust Neural Machine Translation For Clean And Noisy Speech Transcripts (2019)0.00
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- On The Effectiveness Of Neural Text Generation Based Data Augmentation For Recognition Of Morphologically Rich Speech (2020)0.00
- Diffnorm: Self-supervised Normalization For Non-autoregressive Speech-to-speech Translation (2024)0.00
- Improving Code-switching And Named Entity Recognition In ASR With Speech Editing Based Data Augmentation (2023)6.34
- Back-translation-style Data Augmentation For End-to-end ASR (2018)13.11