Improving Fairness And Robustness In End-to-end Speech Recognition Through Unsupervised Clustering
2023 Β· Irina-Elena Veliche, Pascale Fung
Abstract
The challenge of fairness arises when Automatic Speech Recognition (ASR) systems do not perform equally well for all sub-groups of the population. In the past few years there have been many improvements in overall speech recognition quality, but without any particular focus on advancing Equality and Equity for all user groups for whom systems do not perform well. ASR fairness is therefore also a robustness issue. Meanwhile, data privacy also takes priority in production systems. In this paper, we present a privacy preserving approach to improve fairness and robustness of end-to-end ASR without using metadata, zip codes, or even speaker or utterance embeddings directly in training. We extract utterance level embeddings using a speaker ID model trained on a public dataset, which we then use in an unsupervised fashion to create acoustic clusters. We use cluster IDs instead of speaker utterance embeddings as extra features during model training, which shows improvements for all demographic
Authors
(none)
Tags
Stats
Related papers
- Toward Fairness In Speech Recognition: Discovery And Mitigation Of Performance Disparities (2022)9.03
- Robust Speaker Recognition Using Unsupervised Adversarial Invariance (2019)9.76
- Enrolment-based Personalisation For Improving Individual-level Fairness In Speech Emotion Recognition (2024)3.58
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58
- To Train Or Not To Train Adversarially: A Study Of Bias Mitigation Strategies For Speaker Recognition (2022)0.00
- Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning (2025)0.00
- Some Voices Are Too Common: Building Fair Speech Recognition Systems Using The Common Voice Dataset (2023)5.24
- On-device Speaker Anonymization Of Acoustic Embeddings For ASR Based Onflexible Location Gradient Reversal Layer (2023)0.00