Seed-asr: Understanding Diverse Speech And Contexts With Llm-based Speech Recognition
2024 Β· Ye Bai, Jingping Chen, Jitong Chen, et al.
Abstract
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further d
Authors
(none)
Tags
Stats
Related papers
- Tiny-align: Bridging Automatic Speech Recognition And Large Language Model On The Edge (2024)0.00
- Context And System Fusion In Post-asr Emotion Recognition With Large Language Models (2024)0.00
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- Exploring The Integration Of Large Language Models Into Automatic Speech Recognition Systems: An Empirical Study (2023)8.09
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- Harnessing The Zero-shot Power Of Instruction-tuned Large Language Model In End-to-end Speech Recognition (2023)0.00
- A Comprehensive Solution To Connect Speech Encoder And Large Language Model For ASR (2024)0.00