Personalization Of Ctc-based End-to-end Speech Recognition Using Pronunciation-driven Subword Tokenization
2023 Β· Zhihong Lei, Ernest Pusateri, Shiyi Han, et al.
Abstract
Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. Building on previous work, we present a novel method for generating additional subword tokenizations for personal entities from their pronunciations. We show that using this technique in combination with two established techniques, contextual biasing and wordpiece prior normalization, we are able to achieve personal named entity accuracy on par with a competitive hybrid system.
Authors
(none)
Tags
Stats
Related papers
- Towards Personalization Of CTC Speech Recognition Models With Contextual Adapters And Adaptive Boosting (2022)0.00
- Deep Shallow Fusion For RNN-T Personalization (2020)12.81
- PROCTER: Pronunciation-aware Contextual Adapter For Personalized Speech Recognition In Neural Transducers (2023)8.60
- Personalization For Bert-based Discriminative Speech Recognition Rescoring (2023)5.24
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00
- Contextual Adapters For Personalized Speech Recognition In Neural Transducers (2022)12.47
- Advances In All-neural Speech Recognition (2016)11.29
- The Universal Personalizer: Few-shot Dysarthric Speech Recognition Via Meta-learning (2025)0.00