Reconstructing Speech From Real-time Articulatory MRI Using Neural Vocoders
2021 · Yide Yu, Amin Honarmandi Shandiz, László Tóth
Abstract
Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error (MAE) of our networks, we also evaluate our models by comparing the speech signals obtained using severa
Authors
(none)
Tags
Stats
Related papers
- Speaker Dependent Articulatory-to-acoustic Mapping Using Real-time MRI Of The Vocal Tract (2020)4.52
- Silent Speech And Emotion Recognition From Vocal Tract Shape Dynamics In Real-time MRI (2021)6.34
- Mri2speech: Speech Synthesis From Articulatory Movements Recorded By Real-time MRI (2024)4.52
- Towards Automatic Speech Identification From Vocal Tract Shape Dynamics In Real-time MRI (2018)0.00
- Ultrasound-based Articulatory-to-acoustic Mapping With Waveglow Speech Synthesis (2020)8.82
- Real-time MRI Video Synthesis From Time Aligned Phonemes With Sequence-to-sequence Networks (2022)6.77
- Synthesizing Audio From Tongue Motion During Speech Using Tagged MRI Via Transformer (2023)0.00
- Speech2rtmri: Speech-guided Diffusion Model For Real-time MRI Video Of The Vocal Tract During Speech (2024)3.58