Mobileasr: A Resource-aware On-device Learning Framework For User Voice Personalization Applications On Mobile Phones
2023 Β· Zitha Sasindran, Harsha Yelchuri, Pooja Rao, et al.
Abstract
We describe a comprehensive methodology for developing user-voice personalized automatic speech recognition (ASR) models by effectively training models on mobile phones, allowing user data and models to be stored and used locally. To achieve this, we propose a resource-aware sub-model-based training approach that considers the RAM, and battery capabilities of mobile phones. By considering the evaluation metric and resource constraints of the mobile phones, we are able to perform efficient training and halt the process accordingly. To simulate real users, we use speakers with various accents. The entire on-device training and evaluation framework was then tested on various mobile phones across brands. We show that fine-tuning the models and selecting the right hyperparameter values is a trade-off between the lowest achievable performance metric, on-device training time, and memory consumption. Overall, our methodology offers a comprehensive solution for developing personalized ASR model
Authors
(none)
Tags
Stats
Related papers
- Fast Contextual Adaptation With Neural Associative Memory For On-device Personalized Speech Recognition (2021)9.76
- Gated Low-rank Adaptation For Personalized Code-switching Automatic Speech Recognition On The Low-spec Devices (2024)0.00
- A Model For Every User And Budget: Label-free And Personalized Mixed-precision Quantization (2023)0.00
- Dyn-asr: Compact, Multilingual Speech Recognition Via Spoken Language And Accent Identification (2021)5.24
- Tiny-align: Bridging Automatic Speech Recognition And Large Language Model On The Edge (2024)0.00
- Personalized Speech Recognition On Mobile Devices (2016)15.37
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Text Is All You Need: Personalizing ASR Models Using Controllable Speech Synthesis (2023)7.16