Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning
2025 Β· Monorama Swain, Bubai Maji, Jagabandhu Mishra, et al.
Abstract
In this work, we address the challenge of building fair English ASR systems for second-language speakers. Our analysis of widely used ASR models, Whisper and Seamless-M4T, reveals large fluctuations in word error rate (WER) across 26 accent groups, indicating significant fairness gaps. To mitigate this, we propose fairness-prompted finetuning with lightweight adapters, incorporating Spectral Decoupling (SD), Group Distributionally Robust Optimization (Group-DRO), and Invariant Risk Minimization (IRM). Our proposed fusion of traditional empirical risk minimization (ERM) with cross-entropy and fairness-driven objectives (SD, Group DRO, and IRM) enhances fairness across accent groups while maintaining overall recognition accuracy. In terms of macro-averaged word error rate, our approach achieves a relative improvement of 58.7% and 58.5% over the large pretrained Whisper and SeamlessM4T, and 9.7% and 7.8% over them, finetuning with standard empirical risk minimization with cross-entropy lo
Authors
(none)
Tags
Stats
Related papers
- Reducing Geographic Disparities In Automatic Speech Recognition Via Elastic Weight Consolidation (2022)2.26
- DITTO: Data-efficient And Fair Targeted Subset Selection For ASR Accent Adaptation (2021)5.24
- Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages (2025)3.29
- Effective Text Adaptation For Llm-based ASR Through Soft Prompt Fine-tuning (2024)5.84
- Toward Fairness In Speech Recognition: Discovery And Mitigation Of Performance Disparities (2022)9.03
- Discriminative Speech Recognition Rescoring With Pre-trained Language Models (2023)2.26
- Adapting Whisper For Code-switching Through Encoding Refining And Language-aware Decoding (2024)0.00
- Improving Fairness And Robustness In End-to-end Speech Recognition Through Unsupervised Clustering (2023)0.00