Exploring The Integration Of Large Language Models Into Automatic Speech Recognition Systems: An Empirical Study
2023 Β· Zeping Min, Jinbo Wang
Abstract
This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems to improve transcription accuracy. The increasing sophistication of LLMs, with their in-context learning capabilities and instruction-following behavior, has drawn significant attention in the field of Natural Language Processing (NLP). Our primary focus is to investigate the potential of using an LLM's in-context learning capabilities to enhance the performance of ASR systems, which currently face challenges such as ambient noise, speaker accents, and complex linguistic contexts. We designed a study using the Aishell-1 and LibriSpeech datasets, with ChatGPT and GPT-4 serving as benchmarks for LLM capabilities. Unfortunately, our initial experiments did not yield promising results, indicating the complexity of leveraging LLM's in-context learning for ASR applications. Despite further exploration with varied settings and models, the corrected sentences from the LLMs freque
Authors
(none)
Tags
Stats
Related papers
- A Comprehensive Solution To Connect Speech Encoder And Large Language Model For ASR (2024)0.00
- Recent Advances In Speech Language Models: A Survey (2024)14.64
- Towards Interfacing Large Language Models With ASR Systems Using Confidence Measures And Prompting (2024)7.16
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- Large Language Model Can Transcribe Speech In Multi-talker Scenarios With Versatile Instructions (2024)11.23
- Boosting Large Language Model For Speech Synthesis: An Empirical Study (2023)6.77
- Tiny-align: Bridging Automatic Speech Recognition And Large Language Model On The Edge (2024)0.00
- Adapting Large Language Model With Speech For Fully Formatted End-to-end Speech Recognition (2023)0.00