Simullr: Simultaneous Lip Reading Transducer With Attention-guided Adaptive Memory
2021 Β· Zhijie Lin, Zhou Zhao, Haoyuan Li, et al.
Abstract
Lip reading, aiming to recognize spoken sentences according to the given video of lip movements without relying on the audio stream, has attracted great interest due to its application in many scenarios. Although prior works that explore lip reading have obtained salient achievements, they are all trained in a non-simultaneous manner where the predictions are generated requiring access to the full video. To breakthrough this constraint, we study the task of simultaneous lip reading and devise SimulLR, a simultaneous lip Reading transducer with attention-guided adaptive memory from three aspects: (1) To address the challenge of monotonic alignments while considering the syntactic structure of the generated sentences under simultaneous setting, we build a transducer-based model and design several effective training strategies including CTC pre-training, model warm-up and curriculum learning to promote the training of the lip reading transducer. (2) To learn better spatio-temporal represe
Authors
(none)
Tags
Stats
Related papers
- Multi-grained Spatio-temporal Modeling For Lip-reading (2019)0.00
- Lipper: Synthesizing Thy Speech Using Multi-view Lipreading (2019)10.61
- Spatio-temporal Attention Mechanism And Knowledge Distillation For Lip Reading (2021)0.00
- Lipformer: Learning To Lipread Unseen Speakers Based On Visual-landmark Transformers (2023)11.49
- Lipsound2: Self-supervised Pre-training For Lip-to-speech Reconstruction And Lip Reading (2021)11.39
- Target Speaker Lipreading By Audio-visual Self-distillation Pretraining And Speaker Adaptation (2025)5.24
- Lipvoicer: Generating Speech From Silent Videos Guided By Lip Reading (2023)3.89
- Learning Separable Hidden Unit Contributions For Speaker-adaptive Lip-reading (2023)0.00