Large-scale Self-supervised Speech Representation Learning For Automatic Speaker Verification
2021 Β· Zhengyang Chen, Sanyuan Chen, Yu Wu, et al.
Abstract
The speech representations learned from large-scale unlabeled data have shown better generalizability than those from supervised learning and thus attract a lot of interest to be applied for various downstream tasks. In this paper, we explore the limits of speech representations learned by different self-supervised objectives and datasets for automatic speaker verification (ASV), especially with a well-recognized SOTA ASV model, ECAPA-TDNN [1], as a downstream model. The representations from all hidden layers of the pre-trained model are firstly averaged with learnable weights and then fed into the ECAPA-TDNN as input features. The experimental results on Voxceleb dataset show that the weighted average representation is significantly superior to FBank, a conventional handcrafted feature for ASV. Our best single system achieves 0.537%, 0.569%, and 1.180% equal error rate (EER) on the three official trials of VoxCeleb1, separately. Accordingly, the ensemble system with three pre-trained
Authors
(none)
Tags
Stats
Related papers
- Towards Supervised Performance On Speaker Verification With Self-supervised Learning By Leveraging Large-scale ASR Models (2024)7.50
- Self-supervised Learning Based Domain Adaptation For Robust Speaker Verification (2021)11.49
- Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021)15.73
- An Exploration Of Self-supervised Pretrained Representations For End-to-end Speech Recognition (2021)12.25
- Curriculum Learning For Self-supervised Speaker Verification (2022)8.09
- Asymmetric Clean Segments-guided Self-supervised Learning For Robust Speaker Verification (2023)5.84
- Label-efficient Self-supervised Speaker Verification With Information Maximization And Contrastive Learning (2022)6.77
- Self-distillation Prototypes Network: Learning Robust Speaker Representations Without Supervision (2023)4.52