Orthogonality And Isotropy Of Speaker And Phonetic Information In Self-supervised Speech Representations
2024 Β· Mukhtar Mohamed, Oli Danyi Liu, Hao Tang, et al.
Abstract
Self-supervised speech representations can hugely benefit downstream speech technologies, yet the properties that make them useful are still poorly understood. Two candidate properties related to the geometry of the representation space have been hypothesized to correlate well with downstream tasks: (1) the degree of orthogonality between the subspaces spanned by the speaker centroids and phone centroids, and (2) the isotropy of the space, i.e., the degree to which all dimensions are effectively utilized. To study them, we introduce a new measure, Cumulative Residual Variance (CRV), which can be used to assess both properties. Using linear classifiers for speaker and phone ID to probe the representations of six different self-supervised models and two untrained baselines, we ask whether either orthogonality or isotropy correlate with linear probing accuracy. We find that both measures correlate with phonetic probing accuracy, though our results on isotropy are more nuanced.
Authors
(none)
Tags
Stats
Related papers
- Phone And Speaker Spatial Organization In Self-supervised Speech Representations (2023)2.26
- Self-supervised Predictive Coding Models Encode Speaker And Phonetic Information In Orthogonal Subspaces (2023)7.16
- Revisiting Self-supervised Learning Of Speech Representation From A Mutual Information Perspective (2024)4.52
- Similarity Analysis Of Self-supervised Speech Representations (2020)10.07
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00
- Layer-wise Analysis Of A Self-supervised Speech Representation Model (2021)17.07
- Analyzing Speaker Information In Self-supervised Models To Improve Zero-resource Speech Processing (2021)9.23