Deep Segment Attentive Embedding For Duration Robust Speaker Verification
2018 Β· Bin Liu, Shuai Nie, Yaping Zhang, et al.
Abstract
LSTM-based speaker verification usually uses a fixed-length local segment randomly truncated from an utterance to learn the utterance-level speaker embedding, while using the average embedding of all segments of a test utterance to verify the speaker, which results in a critical mismatch between testing and training. This mismatch degrades the performance of speaker verification, especially when the durations of training and testing utterances are very different. To alleviate this issue, we propose the deep segment attentive embedding method to learn the unified speaker embeddings for utterances of variable duration. Each utterance is segmented by a sliding window and LSTM is used to extract the embedding of each segment. Instead of only using one local segment, we use the whole utterance to learn the utterance-level embedding by applying an attentive pooling to the embeddings of all segments. Moreover, the similarity loss of segment-level embeddings is introduced to guide the segment
Authors
(none)
Tags
Stats
Related papers
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Segment Aggregation For Short Utterances Speaker Verification Using Raw Waveforms (2020)0.00
- Universal Speaker Recognition Encoders For Different Speech Segments Duration (2022)4.52
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- A Unified Deep Learning Framework For Short-duration Speaker Verification In Adverse Environments (2020)9.41
- Deep Representation Decomposition For Rate-invariant Speaker Verification (2022)2.26
- Short-segment Speaker Verification With Pre-trained Models And Multi-resolution Encoder (2025)0.00