Comparative Layer-wise Analysis Of Self-supervised Speech Models
2022 Β· Ankita Pasad, Bowen Shi, Karen Livescu
Abstract
Many self-supervised speech models, varying in their pre-training objective, input modality, and pre-training data, have been proposed in the last few years. Despite impressive successes on downstream tasks, we still have a limited understanding of the properties encoded by the models and the differences across models. In this work, we examine the intermediate representations for a variety of recent models. Specifically, we measure acoustic, phonetic, and word-level properties encoded in individual layers, using a lightweight analysis tool based on canonical correlation analysis (CCA). We find that these properties evolve across layers differently depending on the model, and the variations relate to the choice of pre-training objective. We further investigate the utility of our analyses for downstream tasks by comparing the property trends with performance on speech recognition and spoken language understanding tasks. We discover that CCA trends provide reliable guidance to choose laye
Authors
(none)
Tags
Stats
Related papers
- Layer-wise Analysis Of A Self-supervised Speech Representation Model (2021)17.07
- An Empirical Analysis Of Speech Self-supervised Learning At Multiple Resolutions (2024)0.00
- Similarity Analysis Of Self-supervised Speech Representations (2020)10.07
- What Do Self-supervised Speech And Speaker Models Learn? New Findings From A Cross Model Layer-wise Analysis (2024)8.09
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- What Do Self-supervised Speech Models Know About Words? (2023)0.00
- A Layer-wise Analysis Of Mandarin And English Suprasegmentals In SSL Speech Models (2024)0.00
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00