Graph Attentive Feature Aggregation For Text-independent Speaker Verification
2021 Β· Hye-Jin Shim, Jungwoo Heo, Jae-Han Park, et al.
Abstract
The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pairwise relationship. For this purpose, we propose a novel graph attentive feature aggregation module by interpreting each frame-level feature as a node of a graph. The inter-relationship between all possible pairs of features, typically exploited indirectly, can be directly modeled using a graph. The module comprises a graph attention layer and a graph pooling layer followed by a readout operation. The graph attention layer first models the non-Euclidean data manifold between different nodes. Then, the graph pooling layer discards less informative nodes considering the significance of the nodes. Finally, the readout operation combines the remaining nodes into a single representation. We employ two recent systems, SE-ResNet and RawNet2, with different input features and architectures and demonstrate that the proposed feature aggregation module consistently s
Authors
(none)
Tags
Stats
Related papers
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- Graph Attention Networks For Speaker Verification (2020)9.23
- Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling On Self-supervised Representation (2023)5.84
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- The Graph Feature Fusion Technique For Speaker Recognition Based On Wav2vec2.0 Framework (2023)0.00
- Segment Aggregation For Short Utterances Speaker Verification Using Raw Waveforms (2020)0.00
- Exploring A Unified Attention-based Pooling Framework For Speaker Verification (2018)6.77
- Improving Multi-scale Aggregation Using Feature Pyramid Module For Robust Speaker Verification Of Variable-duration Utterances (2020)10.48