Layer-wise Analysis Of A Self-supervised Speech Representation Model
2021 Β· Ankita Pasad, Ju-Chieh Chou, Karen Livescu
Abstract
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand
Authors
(none)
Tags
Stats
Related papers
- Comparative Layer-wise Analysis Of Self-supervised Speech Models (2022)0.00
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition (2022)11.19
- Revisiting Self-supervised Learning Of Speech Representation From A Mutual Information Perspective (2024)4.52
- Similarity Analysis Of Self-supervised Speech Representations (2020)10.07
- What Do Self-supervised Speech Models Know About Words? (2023)0.00
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00
- Multichannel Av-wav2vec2: A Framework For Learning Multichannel Multi-modal Speech Representation (2024)7.16