Large-scale Learning Of Generalised Representations For Speaker Recognition
2022 Β· Jee-Weon Jung, Hee-Soo Heo, Bong-Jin Lee, et al.
Abstract
The objective of this work is to develop a speaker recognition model to be used in diverse scenarios. We hypothesise that two components should be adequately configured to build such a model. First, adequate architecture would be required. We explore several recent state-of-the-art models, including ECAPA-TDNN and MFA-Conformer, as well as other baselines. Second, a massive amount of data would be required. We investigate several new training data configurations combining a few existing datasets. The most extensive configuration includes over 87k speakers' 10.22k hours of speech. Four evaluation protocols are adopted to measure how the trained model performs in diverse scenarios. Through experiments, we find that MFA-Conformer with the least inductive bias generalises the best. We also show that training with proposed large data configurations gives better performance. A boost in generalisation is observed, where the average performance on four evaluation protocols improves by more tha
Authors
(none)
Tags
Stats
Related papers
- Generalized Domain Adaptation Framework For Parametric Back-end In Speaker Recognition (2023)0.00
- Speech2phone: A Novel And Efficient Method For Training Speaker Recognition Models (2020)2.26
- Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021)15.73
- Toward Domain-invariant Speech Recognition Via Large Scale Training (2018)13.39
- Large-scale Self-supervised Speech Representation Learning For Automatic Speaker Verification (2021)15.25
- Mfa-conformer: Multi-scale Feature Aggregation Conformer For Automatic Speaker Verification (2022)15.46
- Training-free Deepfake Voice Recognition By Leveraging Large-scale Pre-trained Models (2024)9.23
- Improving Speaker Representations Using Contrastive Losses On Multi-scale Features (2024)0.00