Personalization For Bert-based Discriminative Speech Recognition Rescoring
2023 Β· Jari Kolehmainen, Yile Gu, Aditya Gourav, et al.
Abstract
Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention based encoder-decoder model. We use internal de-identified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a general test set by 1%.
Authors
(none)
Tags
Stats
Related papers
- The Universal Personalizer: Few-shot Dysarthric Speech Recognition Via Meta-learning (2025)0.00
- Fast Contextual Adaptation With Neural Associative Memory For On-device Personalized Speech Recognition (2021)9.76
- Personalization Of Ctc-based End-to-end Speech Recognition Using Pronunciation-driven Subword Tokenization (2023)6.77
- PROCTER: Pronunciation-aware Contextual Adapter For Personalized Speech Recognition In Neural Transducers (2023)8.60
- Federated Marginal Personalization For ASR Rescoring (2020)2.26
- Text Is All You Need: Personalizing ASR Models Using Controllable Speech Synthesis (2023)7.16
- Deep Shallow Fusion For RNN-T Personalization (2020)12.81
- Contextual Adapters For Personalized Speech Recognition In Neural Transducers (2022)12.47