Socov: Semi-orthogonal Parametric Pooling Of Covariance Matrix For Speaker Recognition
2025 Β· Rongjin Li, Weibin Zhang, Dongpeng Chen, et al.
Abstract
In conventional deep speaker embedding frameworks, the pooling layer aggregates all frame-level features over time and computes their mean and standard deviation statistics as inputs to subsequent segment-level layers. Such statistics pooling strategy produces fixed-length representations from variable-length speech segments. However, this method treats different frame-level features equally and discards covariance information. In this paper, we propose the Semi-orthogonal parameter pooling of Covariance matrix (SoCov) method. The SoCov pooling computes the covariance matrix from the self-attentive frame-level features and compresses it into a vector using the semi-orthogonal parametric vectorization, which is then concatenated with the weighted standard deviation vector to form inputs to the segment-level layers. Deep embedding based on SoCov is called ``sc-vector''. The proposed sc-vector is compared to several different baselines on the SRE21 development and evaluation sets. The sc-
Authors
(none)
Tags
Stats
Related papers
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Exploring A Unified Attention-based Pooling Framework For Speaker Verification (2018)6.77
- Study On The Temporal Pooling Used In Deep Neural Networks For Speaker Verification (2021)5.84
- Aca-net: Towards Lightweight Speaker Verification Using Asymmetric Cross Attention (2023)0.00
- Spatial Pyramid Encoding With Convex Length Normalization For Text-independent Speaker Verification (2019)8.82
- Speaker Sincerity Detection Based On Covariance Feature Vectors And Ensemble Methods (2019)0.00
- Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-speaker Recordings (2024)2.26