Whispering Llama: A Cross-modal Generative Error Correction Framework For Speech Recognition
2023 Β· Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, et al.
Abstract
We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR). Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech transcription contexts. This marks a step towards a fresh paradigm in generative error correction within the realm of n-best hypotheses. Unlike the existing ranking-based rescoring methods, our approach adeptly uses distinct initialization techniques and parameter-efficient algorithms to boost ASR performance derived from pre-trained speech and text models. Through evaluation across diverse ASR datasets, we evaluate the stability and reproducibility of our fusion technique, demonstrating its improved word error rate relative (WERR) performance in comparison to n-best hypotheses by relatively 37.66%. To encourage future research, we have made our code and pre-trained models open source at https://github.com/Srijith-rkr/Whispering-LLaMA.
Authors
(none)
Tags
Stats
Code
Related papers
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- Lipger: Visually-conditioned Generative Error Correction For Robust Automatic Speech Recognition (2024)2.26
- It's Never Too Late: Fusing Acoustic Information Into Large Language Models For Automatic Speech Recognition (2024)0.00
- Large Language Model Based Generative Error Correction: A Challenge And Baselines For Speech Recognition, Speaker Tagging, And Emotion Recognition (2024)7.81
- Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages (2025)3.29
- Listening And Seeing Again: Generative Error Correction For Audio-visual Speech Recognition (2025)1.20
- Let's Fuse Step By Step: A Generative Fusion Decoding Algorithm With Llms For Robust And Instruction-aware ASR And OCR (2024)0.00
- Cross-modal ASR Post-processing System For Error Correction And Utterance Rejection (2022)0.00