Attention Back-end For Automatic Speaker Verification With Multiple Enrollment Utterances
2021 Β· Chang Zeng, Xin Wang, Erica Cooper, et al.
Abstract
Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities. To make better use of multiple enrollment utterances, we propose a novel attention back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we conduct a series of experiments on CNCeleb and VoxCeleb datasets by combining it with several sate-of-the-art speaker encoders including TDNN and ResNet. Experimental results using multiple enrollment utterances on CNCeleb show that the proposed attention back-end model leads to lower EER and minDCF score than the PLDA and cosine similarity counterparts for each speaker encoder and an experimen
Authors
(none)
Tags
Stats
Related papers
- Joint Speaker Encoder And Neural Back-end Model For Fully End-to-end Automatic Speaker Verification With Multiple Enrollment Utterances (2022)0.00
- Multiobjective Optimization Training Of PLDA For Speaker Verification (2018)2.26
- End-to-end Attention Based Text-dependent Speaker Verification (2017)14.87
- Double Multi-head Attention For Speaker Verification (2020)8.09
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- MFA: TDNN With Multi-scale Frequency-channel Attention For Text-independent Speaker Verification With Short Utterances (2022)13.79
- Scoring Of Large-margin Embeddings For Speaker Verification: Cosine Or PLDA? (2022)9.76