ARTI-6: Towards Six-dimensional Articulatory Speech Encoding
2025 Β· Jihwan Lee, Sean Foley, Thanathai Lertpetchpun, et al.
Abstract
We propose ARTI-6, a compact six-dimensional articulatory speech encoding framework derived from real-time MRI data that captures crucial vocal tract regions including the velum, tongue root, and larynx. ARTI-6 consists of three components: (1) a six-dimensional articulatory feature set representing key regions of the vocal tract; (2) an articulatory inversion model, which predicts articulatory features from speech acoustics leveraging speech foundation models, achieving a prediction correlation of 0.87; and (3) an articulatory synthesis model, which reconstructs intelligible speech directly from articulatory features, showing that even a low-dimensional representation can generate natural-sounding speech. Together, ARTI-6 provides an interpretable, computationally efficient, and physiologically grounded framework for advancing articulatory inversion, synthesis, and broader speech technology applications. The source code and speech samples are publicly available.
Authors
(none)
Tags
Stats
Related papers
- Speaker Dependent Articulatory-to-acoustic Mapping Using Real-time MRI Of The Vocal Tract (2020)4.52
- Reconstructing Speech From Real-time Articulatory MRI Using Neural Vocoders (2021)0.00
- Mri2speech: Speech Synthesis From Articulatory Movements Recorded By Real-time MRI (2024)4.52
- Acoustic-to-articulatory Inversion Based On Speech Decomposition And Auxiliary Feature (2022)0.00
- Independent And Automatic Evaluation Of Acoustic-to-articulatory Inversion Models (2019)0.00
- Speech2rtmri: Speech-guided Diffusion Model For Real-time MRI Video Of The Vocal Tract During Speech (2024)3.58
- Silent Speech And Emotion Recognition From Vocal Tract Shape Dynamics In Real-time MRI (2021)6.34
- Deep Neural Convolutive Matrix Factorization For Articulatory Representation Decomposition (2022)7.50