Samba-asr: State-of-the-art Speech Recognition Leveraging Structured State-space Models
2025 Β· Syed Abdul Gaffar Shakhadri, Kruthika Kr, Kartik Basavaraj Angadi
Abstract
We propose Samba ASR,the first state of the art Automatic Speech Recognition(ASR)model leveraging the novel Mamba architecture as both encoder and decoder,built on the foundation of state space models(SSMs).Unlike transformerbased ASR models,which rely on self-attention mechanisms to capture dependencies,Samba ASR effectively models both local and global temporal dependencies using efficient statespace dynamics,achieving remarkable performance gains.By addressing the limitations of transformers,such as quadratic scaling with input length and difficulty in handling longrange dependencies,Samba ASR achieves superior accuracy and efficiency.Experimental results demonstrate that Samba ASR surpasses existing opensource transformerbased ASR models across various standard benchmarks,establishing it as the new state of theart in ASR.Extensive evaluations on the benchmark dataset show significant improvements in Word Error Rate(WER),with competitive performance even in lowresource scenarios.Fur
Authors
(none)
Tags
Stats
Related papers
- SSAMBA: Self-supervised Audio Representation Learning With Mamba State Space Model (2024)0.00
- Mamba-based Decoder-only Approach With Bidirectional Speech Modeling For Speech Recognition (2024)0.00
- Audio Mamba: Bidirectional State Space Model For Audio Representation Learning (2024)11.58
- Dual-path Mamba: Short And Long-term Bidirectional Selective Structured State Space Models For Speech Separation (2024)4.12
- SAM: A Mamba-2 State-space Audio-language Model (2025)0.00
- Audio Mamba: Selective State Spaces For Self-supervised Audio Representations (2024)9.23
- An Investigation Of Incorporating Mamba For Speech Enhancement (2024)13.70
- Mamba-seunet: Mamba Unet For Monaural Speech Enhancement (2024)7.16