Self-supervised Models Of Speech Infer Universal Articulatory Kinematics
2023 Β· Cheol Jun Cho, Abdelrahman Mohamed, Alan W Black, et al.
Abstract
Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correlate their internal representations to different aspects of speech. In this paper, we show "inference of articulatory kinematics" as fundamental property of SSL models, i.e., the ability of these models to transform acoustics into the causal articulatory dynamics underlying the speech signal. We also show that this abstraction is largely overlapping across the language of the data used to train the model, with preference to the language with similar phonological system. Furthermore, we show that with simple affine transformations, Acoustic-to-Articulatory inversion (AAI) is transferrable across speakers, even across genders, languages, and dialects, showing the generalizability of this property. Together, these results shed new light on the
Authors
(none)
Tags
Stats
Related papers
- Evidence Of Vocal Tract Articulation In Self-supervised Learning Of Speech (2022)9.41
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00
- Understanding Self-supervised Learning Of Speech Representation Via Invariance And Redundancy Reduction (2023)0.00
- Improving Speech Inversion Through Self-supervised Embeddings And Enhanced Tract Variables (2023)5.24
- Automatic Pronunciation Assessment Using Self-supervised Speech Representation Learning (2022)0.00
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00