Speaker-independent Acoustic-to-articulatory Inversion Through Multi-channel Attention Discriminator
2024 Β· Woo-Jin Chung, Hong-Goo Kang
Abstract
We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography (EMA) signals during the AAI process. We train our model using an adversarial approach and introduce an attention-based Multi-duration phoneme discriminator (MDPD) designed to fully capture the intricate relationship among multi-channel articulatory signals. Our method achieves a Pearson correlation coefficient of 0.847, marking state-of-the-art performance in speaker-independent AAI models. The implementation details and code can be found online.
Authors
(none)
Tags
Stats
Related papers
- Acoustic-to-articulatory Inversion Based On Speech Decomposition And Auxiliary Feature (2022)0.00
- Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition (2022)8.09
- Speaker- And Text-independent Estimation Of Articulatory Movements And Phoneme Alignments From Speech (2024)2.26
- Independent And Automatic Evaluation Of Acoustic-to-articulatory Inversion Models (2019)0.00
- Audio Data Augmentation For Acoustic-to-articulatory Speech Inversion Using Bidirectional Gated Rnns (2022)0.00
- Articulatory-wavenet: Autoregressive Model For Acoustic-to-articulatory Inversion (2020)0.00
- Self-supervised Models Of Speech Infer Universal Articulatory Kinematics (2023)0.00
- Improving Speech Inversion Through Self-supervised Embeddings And Enhanced Tract Variables (2023)5.24