Speech Recognition With Llms Adapted To Disordered Speech Using Reinforcement Learning
2024 Β· Chirag Nagpal, Subhashini Venugopalan, Jimmy Tobin, et al.
Abstract
We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables it to adapt better to disordered speech than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM's vocabulary with audio tokens and enables the model to recognize speech by fine-tuning it on speech with transcripts. We then use RL with rewards based on syntactic and semantic accuracy measures generalizing the LLM further to recognize disordered speech. While the resulting LLM does not outperform existing systems for speech recognition, we find that tuning with reinforcement learning using custom rewards leads to substantially better performance than supervised fine-tuning of the language model, specifically when adapting to speech in a different setting. This presents a compelling alternative tuning strategy for speech recognition using large language models.
Authors
(none)
Tags
Stats
Related papers
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- Adapting Large Language Model With Speech For Fully Formatted End-to-end Speech Recognition (2023)0.00
- Discrete Multimodal Transformers With A Pretrained Large Language Model For Mixed-supervision Speech Processing (2024)0.00
- Exploring Fine-tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data (2025)0.00
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- Large Language Model Can Transcribe Speech In Multi-talker Scenarios With Versatile Instructions (2024)11.23
- Prompting Large Language Models For Zero-shot Domain Adaptation In Speech Recognition (2023)0.00
- Improving Robustness Of Llm-based Speech Synthesis By Learning Monotonic Alignment (2024)0.00