Neural Scoring: A Refreshed End-to-end Approach For Speaker Recognition In Complex Conditions
2024 Β· Wan Lin, Junhui Chen, Tianhao Wang, et al.
Abstract
Modern speaker verification systems primarily rely on speaker embeddings, followed by verification based on cosine similarity between the embedding vectors of the enrollment and test utterances. While effective, these methods struggle with multi-talker speech due to the unidentifiability of embedding vectors. In this paper, we propose Neural Scoring (NS), a refreshed end-to-end framework that directly estimates verification posterior probabilities without relying on test-side embeddings, making it more robust to complex conditions, e.g., with multiple talkers. To make the training of such an end-to-end model more efficient, we introduce a large-scale trial e2e training (LtE2E) strategy, where each test utterance pairs with a set of enrolled speakers, thus enabling the processing of large-scale verification trials per batch. Experiments on the VoxCeleb dataset demonstrate that NS consistently outperforms both the baseline and competitive methods across various conditions, achieving an o
Authors
(none)
Tags
Stats
Related papers
- Joint Speaker Encoder And Neural Back-end Model For Fully End-to-end Automatic Speaker Verification With Multiple Enrollment Utterances (2022)0.00
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- ECAPA2: A Hybrid Neural Network Architecture And Training Strategy For Robust Speaker Embeddings (2024)0.00
- How To Leverage Dnn-based Speech Enhancement For Multi-channel Speaker Verification? (2022)0.00
- Analysis Of Length Normalization In End-to-end Speaker Verification System (2018)9.41
- Neural Network Based Speaker Classification And Verification Systems With Enhanced Features (2017)8.60
- Delving Into Voxceleb: Environment Invariant Speaker Recognition (2019)10.35