Intapt: Information-theoretic Adversarial Prompt Tuning For Enhanced Non-native Speech Recognition
2023 Β· Eunseop Yoon, Hee Suk Yoon, John Harvill, et al.
Abstract
Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented accents, resulting in a deteriorated performance for non-native (L2) English accents. Although there have been some approaches to mitigate this issue, all of these methods require updating the pre-trained model weights. In this paper, we propose Information Theoretic Adversarial Prompt Tuning (INTapt), which introduces prompts concatenated to the original input that can re-modulate the attention of the pre-trained model such that the corresponding input resembles a native (L1) English speech without updating the backbone weights. INTapt is trained simultaneously in the following two manners: (
Authors
(none)
Tags
Stats
Related papers
- Best Of Both Worlds: Robust Accented Speech Recognition With Adversarial Transfer Learning (2021)9.23
- Residual Adapters For Parameter-efficient ASR Adaptation To Atypical And Accented Speech (2021)10.74
- Prompt Tuning Of Deep Neural Networks For Speaker-adaptive Visual Speech Recognition (2023)0.00
- Effective Text Adaptation For Llm-based ASR Through Soft Prompt Fine-tuning (2024)5.84
- Enhancing Multilingual Speech Recognition Through Language Prompt Tuning And Frame-level Language Adapter (2023)0.00
- DITTO: Data-efficient And Fair Targeted Subset Selection For ASR Accent Adaptation (2021)5.24
- Extending Whisper With Prompt Tuning To Target-speaker ASR (2023)9.59
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00