Spatial Pyramid Encoding With Convex Length Normalization For Text-independent Speaker Verification
2019 Β· Youngmoon Jung, Younggwan Kim, Hyungjun Lim, et al.
Abstract
In this paper, we propose a new pooling method called spatial pyramid encoding (SPE) to generate speaker embeddings for text-independent speaker verification. We first partition the output feature maps from a deep residual network (ResNet) into increasingly fine sub-regions and extract speaker embeddings from each sub-region through a learnable dictionary encoding layer. These embeddings are concatenated to obtain the final speaker representation. The SPE layer not only generates a fixed-dimensional speaker embedding for a variable-length speech segment, but also aggregates the information of feature distribution from multi-level temporal bins. Furthermore, we apply deep length normalization by augmenting the loss function with ring loss. By applying ring loss, the network gradually learns to normalize the speaker embeddings using model weights themselves while preserving convexity, leading to more robust speaker embeddings. Experiments on the VoxCeleb1 dataset show that the proposed s
Authors
(none)
Tags
Stats
Related papers
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Analysis Of Length Normalization In End-to-end Speaker Verification System (2018)9.41
- Speakernet: 1D Depth-wise Separable Convolutional Network For Text-independent Speaker Recognition And Verification (2020)0.00
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- Improving Multi-scale Aggregation Using Feature Pyramid Module For Robust Speaker Verification Of Variable-duration Utterances (2020)10.48
- An Improved Deep Neural Network For Modeling Speaker Characteristics At Different Temporal Scales (2020)6.34