Speech Representation Analysis Based On Inter- And Intra-model Similarities
2024 Β· Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury
Abstract
Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model similarity, independent of any external annotation and task-specific constraint. We examine different SSL models varying their training paradigm -- Contrastive (Wav2Vec2.0) and Predictive models (HuBERT); and model sizes (base and large). We explore these models on different levels of localization/distributivity of information including (i) individual neurons; (ii) layer representation; (iii) attention weights and (iv) compare the representations with their finetuned counterparts.Our results highlight that these models converge to similar representation subspaces but not to similar neuron-localized concepts\footnote\{A concept represents a coher
Authors
(none)
Tags
Stats
Related papers
- Similarity Analysis Of Self-supervised Speech Representations (2020)10.07
- An Empirical Analysis Of Speech Self-supervised Learning At Multiple Resolutions (2024)0.00
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00
- Self-supervised Models Of Speech Infer Universal Articulatory Kinematics (2023)0.00
- Understanding Self-supervised Learning Of Speech Representation Via Invariance And Redundancy Reduction (2023)0.00
- What Do Self-supervised Speech And Speaker Models Learn? New Findings From A Cross Model Layer-wise Analysis (2024)8.09
- Layer-wise Analysis Of A Self-supervised Speech Representation Model (2021)17.07
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00