Speculative Speech Recognition By Audio-prefixed Low-rank Adaptation Of Language Models
2024 Β· Bolaji Yusuf, Murali Karthick Baskar, Andrew Rosenberg, et al.
Abstract
This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performance and we propose a model which does SSR by combining a RNN-Transducer-based ASR system with an audio-prefixed language model (LM). The ASR system transcribes ongoing audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM, which speculates likely completions for the transcriptions. We experiment with a variety of ASR datasets on which show the efficacy our method and the feasibility of SSR as a method of reducing ASR latency.
Authors
(none)
Tags
Stats
Related papers
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Data Augmentation With Locally-time Reversed Speech For Automatic Speech Recognition (2021)0.00
- Principled Coarse-grained Acceptance For Speculative Decoding In Speech (2025)0.00
- Seed-asr: Understanding Diverse Speech And Contexts With Llm-based Speech Recognition (2024)0.00
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Ssr-speech: Towards Stable, Safe And Robust Zero-shot Text-based Speech Editing And Synthesis (2024)2.26
- Performance Improvements Of Probabilistic Transcript-adapted ASR With Recurrent Neural Network And Language-specific Constraints (2016)0.00