Joint Speaker Encoder And Neural Back-end Model For Fully End-to-end Automatic Speaker Verification With Multiple Enrollment Utterances
2022 Β· Chang Zeng, Xiaoxiao Miao, Xin Wang, et al.
Abstract
Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring. However, the sequential optimization of the front-end and back-end models may lead to a local minimum, which theoretically prevents the whole system from achieving the best optimization. Although some methods have been proposed for jointly optimizing the two models, such as the generalized end-to-end (GE2E) model and NPLDA E2E model, all of these methods are designed for use with a single enrollment utterance. In this paper, we propose a new E2E joint method for speaker verification especially designed for the practical case of multiple enrollment utterances. In order to leverage the intra-relationship among multiple enrollment utterances, our model co
Authors
(none)
Tags
Stats
Related papers
- Attention Back-end For Automatic Speaker Verification With Multiple Enrollment Utterances (2021)10.21
- Neural Scoring: A Refreshed End-to-end Approach For Speaker Recognition In Complex Conditions (2024)0.00
- Multiobjective Optimization Training Of PLDA For Speaker Verification (2018)2.26
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- End-to-end DNN Based Speaker Recognition Inspired By I-vector And PLDA (2017)10.35
- Deep Speaker Verification: Do We Need End To End? (2017)7.50
- Neural Network Based Speaker Classification And Verification Systems With Enhanced Features (2017)8.60
- Universal Speaker Recognition Encoders For Different Speech Segments Duration (2022)4.52