A Study Of Gender Impact In Self-supervised Models For Speech-to-text Systems
2022 Β· Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, et al.
Abstract
Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to understand the impact caused by some features such as gender distribution within pre-training data. Using French as our investigation language, we train and compare gender-specific wav2vec 2.0 models against models containing different degrees of gender balance in their pre-training data. The comparison is performed by applying these models to two speech-to-text downstream tasks: ASR and ST. Results show the type of downstream integration matters. We observe lower overall performance using gender-specific pre-training before fine-tuning an end-to-end ASR system. However, when self-supervised mode
Authors
(none)
Tags
Stats
Related papers
- Don't Speak Too Fast: The Impact Of Data Bias On Self-supervised Speech Models (2021)8.35
- Twists, Humps, And Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps (2024)7.53
- Voice, Bias, And Coreference: An Interpretability Study Of Gender In Speech Translation (2026)0.00
- Gender Domain Adaptation For Automatic Speech Recognition Task (2020)2.26
- Some Voices Are Too Common: Building Fair Speech Recognition Systems Using The Common Voice Dataset (2023)5.24
- Unsupervised Fine-tuning Data Selection For ASR Using Self-supervised Speech Models (2022)5.84
- No Pitch Left Behind: Addressing Gender Unbalance In Automatic Speech Recognition Through Pitch Manipulation (2023)6.34
- Breeding Gender-aware Direct Speech Translation Systems (2020)5.84