Explainable AI In Speaker Recognition -- Making Latent Representations Understandable
2026 Β· Yanze Xu, Wenwu Wang, Mark D. Plumbley
Abstract
arXiv:2604.23354v1 Announce Type: cross Abstract: Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to study an XAI topic: uncovering unknown organisational patterns in network representations, particularly those representations learned by the speaker recognition network that recognises the speaker identity of utterances. Past studies employed algorithms (e.g. t-distributed Stochastic Neighbour Embedding and K-means) to analyse and visualise how network representations form independent clusters, indicating the presence of flat clustering phenomena within the space defined by these representations. In contrast, this work applies two algorithms -- Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) -- to analyse how representations form clusters with hierarchical relationships
Authors
(none)
Tags
Stats
Related papers
- Audiomnist: Exploring Explainable Artificial Intelligence For Audio Analysis On A Simple Benchmark (2018)13.50
- Visualizing Automatic Speech Recognition -- Means For A Better Understanding? (2022)4.52
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00
- Interpreting End-to-end Deep Learning Models For Speech Source Localization Using Layer-wise Relevance Propagation (2024)2.26
- Removing Speaker Information From Speech Representation Using Variable-length Soft Pooling (2024)0.00
- Overview Of Speaker Modeling And Its Applications: From The Lens Of Deep Speaker Representation Learning (2024)10.74
- Intra-class Variation Reduction Of Speaker Representation In Disentanglement Framework (2020)8.35
- X-DC: Explainable Deep Clustering Based On Learnable Spectrogram Templates (2020)0.00