Lipreading Using Temporal Convolutional Networks
2020 Β· Brais Martinez, Pingchuan Ma, Stavros Petridis, et al.
Abstract
Lip-reading has attracted a lot of research attention lately thanks to advances in deep learning. The current state-of-the-art model for recognition of isolated words in-the-wild consists of a residual network and Bidirectional Gated Recurrent Unit (BGRU) layers. In this work, we address the limitations of this model and we propose changes which further improve its performance. Firstly, the BGRU layers are replaced with Temporal Convolutional Networks (TCN). Secondly, we greatly simplify the training procedure, which allows us to train the model in one single stage. Thirdly, we show that the current state-of-the-art methodology produces models that do not generalize well to variations on the sequence length, and we addresses this issue by proposing a variable-length augmentation. We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively. Our proposed model results in an absolute improvement of 1.2%
Authors
(none)
Tags
Stats
Related papers
- Lipreading With 3D-2D-CNN BLSTM-HMM And Word-ctc Models (2019)0.00
- Can Dnns Learn To Lipread Full Sentences? (2018)6.77
- Lipreading With Long Short-term Memory (2016)0.00
- Multi-grained Spatio-temporal Modeling For Lip-reading (2019)0.00
- Spatio-temporal Attention Mechanism And Knowledge Distillation For Lip Reading (2021)0.00
- Improving Speaker-independent Lipreading With Domain-adversarial Training (2017)10.85
- Lipformer: Learning To Lipread Unseen Speakers Based On Visual-landmark Transformers (2023)11.49
- Fully Convolutional Speech Recognition (2018)0.00