Personalized Speech Recognition On Mobile Devices
2016 Β· Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, et al.
Abstract
We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.
Authors
(none)
Tags
Stats
Related papers
- Optimizing Speech Recognition For The Edge (2019)0.00
- Neural Speech Recognizer: Acoustic-to-word LSTM Model For Large Vocabulary Speech Recognition (2016)15.16
- Mobilespeech: A Fast And High-fidelity Framework For Mobile Zero-shot Text-to-speech (2024)0.00
- Fast Contextual Adaptation With Neural Associative Memory For On-device Personalized Speech Recognition (2021)9.76
- Mobivsr: A Visual Speech Recognition Solution For Mobile Devices (2019)0.00
- Mobileasr: A Resource-aware On-device Learning Framework For User Voice Personalization Applications On Mobile Phones (2023)0.00
- High Quality Streaming Speech Synthesis With Low, Sentence-length-independent Latency (2021)8.60
- Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks (2020)0.00