Exploring The Encoding Layer And Loss Function In End-to-end Speaker And Language Recognition System
2018 Β· Weicheng Cai, Jinkun Chen, Ming Li
Abstract
In this paper, we explore the encoding/pooling layer and loss function in the end-to-end speaker and language recognition system. First, a unified and interpretable end-to-end system for both speaker and language recognition is developed. It accepts variable-length input and produces an utterance level result. In the end-to-end system, the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation. Besides the basic temporal average pooling, we introduce a self-attentive pooling layer and a learnable dictionary encoding layer to get the utterance level representation. In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system. Experimental results on Voxceleb and NIST LRE 07 datasets show that the performance of end-to-end learning system could be significantly improved by the proposed encoding layer and lo
Authors
(none)
Tags
Stats
Related papers
- Analysis Of Length Normalization In End-to-end Speaker Verification System (2018)9.41
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Exploring A Unified Attention-based Pooling Framework For Speaker Verification (2018)6.77
- Removing Speaker Information From Speech Representation Using Variable-length Soft Pooling (2024)0.00
- A Comparison Of Metric Learning Loss Functions For End-to-end Speaker Verification (2020)6.77
- End-to-end Residual CNN With L-GM Loss Speaker Verification System (2018)2.26
- Frame-level Speaker Embeddings For Text-independent Speaker Recognition And Analysis Of End-to-end Model (2018)12.17