Real-time MRI Video Synthesis From Time Aligned Phonemes With Sequence-to-sequence Networks
2022 Β· Sathvik Udupa, Prasanta Kumar Ghosh
Abstract
Real-Time Magnetic resonance imaging (rtMRI) of the midsagittal plane of the mouth is of interest for speech production research. In this work, we focus on estimating utterance level rtMRI video from the spoken phoneme sequence. We obtain time-aligned phonemes from forced alignment, to obtain frame-level phoneme sequences which are aligned with rtMRI frames. We propose a sequence-to-sequence learning model with a transformer phoneme encoder and convolutional frame decoder. We then modify the learning by using intermediary features obtained from sampling from a pretrained phoneme-conditioned variational autoencoder (CVAE). We train on 8 subjects in a subject-specific manner and demonstrate the performance with a subjective test. We also use an auxiliary task of air tissue boundary (ATB) segmentation to obtain the objective scores on the proposed models. We show that the proposed method is able to generate realistic rtMRI video for unseen utterances, and adding CVAE is beneficial for lea
Authors
(none)
Tags
Stats
Related papers
- Mri2speech: Speech Synthesis From Articulatory Movements Recorded By Real-time MRI (2024)4.52
- Silent Speech And Emotion Recognition From Vocal Tract Shape Dynamics In Real-time MRI (2021)6.34
- Speech2rtmri: Speech-guided Diffusion Model For Real-time MRI Video Of The Vocal Tract During Speech (2024)3.58
- Tagged-mri Sequence To Audio Synthesis Via Self Residual Attention Guided Heterogeneous Translator (2022)0.00
- Speaker Dependent Articulatory-to-acoustic Mapping Using Real-time MRI Of The Vocal Tract (2020)4.52
- Reconstructing Speech From Real-time Articulatory MRI Using Neural Vocoders (2021)0.00
- Towards Automatic Speech Identification From Vocal Tract Shape Dynamics In Real-time MRI (2018)0.00
- Synthesizing Audio From Tongue Motion During Speech Using Tagged MRI Via Transformer (2023)0.00