Evidence Of Vocal Tract Articulation In Self-supervised Learning Of Speech
2022 Β· Cheol Jun Cho, Peter Wu, Abdelrahman Mohamed, et al.
Abstract
Recent self-supervised learning (SSL) models have proven to learn rich representations of speech, which can readily be utilized by diverse downstream tasks. To understand such utilities, various analyses have been done for speech SSL models to reveal which and how information is encoded in the learned representations. Although the scope of previous analyses is extensive in acoustic, phonetic, and semantic perspectives, the physical grounding by speech production has not yet received full attention. To bridge this gap, we conduct a comprehensive analysis to link speech representations to articulatory trajectories measured by electromagnetic articulography (EMA). Our analysis is based on a linear probing approach where we measure articulatory score as an average correlation of linear mapping to EMA. We analyze a set of SSL models selected from the leaderboard of the SUPERB benchmark and perform further layer-wise analyses on two most successful models, Wav2Vec 2.0 and HuBERT. Surprisingl
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Models Of Speech Infer Universal Articulatory Kinematics (2023)0.00
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00
- An Empirical Analysis Of Speech Self-supervised Learning At Multiple Resolutions (2024)0.00
- Automatic Pronunciation Assessment Using Self-supervised Speech Representation Learning (2022)0.00
- Investigation Of Ensemble Features Of Self-supervised Pretrained Models For Automatic Speech Recognition (2022)9.41
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00
- What Do Self-supervised Speech And Speaker Models Learn? New Findings From A Cross Model Layer-wise Analysis (2024)8.09