Can Dnns Learn To Lipread Full Sentences?
2018 Β· George Sterpu, Christian Saam, Naomi Harte
Abstract
Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss. The system is evaluated on the publicly available TCD-TIMIT dataset, with 59 speakers and a vocabulary of over 6000 words. Results show a major improvement on a Hidden Markov Model framework. A fuller analysis of performance across visemes demonstrates that the network is not only learning the language model, but actually learning to lipread.
Authors
(none)
Tags
Stats
Related papers
- Lipreading With 3D-2D-CNN BLSTM-HMM And Word-ctc Models (2019)0.00
- Lipreading With Long Short-term Memory (2016)0.00
- Lipreading Using Temporal Convolutional Networks (2020)17.61
- Towards Lipreading Sentences With Active Appearance Models (2018)8.82
- Lipformer: Learning To Lipread Unseen Speakers Based On Visual-landmark Transformers (2023)11.49
- Multi-grained Spatio-temporal Modeling For Lip-reading (2019)0.00
- Simullr: Simultaneous Lip Reading Transducer With Attention-guided Adaptive Memory (2021)8.09
- Learning Separable Hidden Unit Contributions For Speaker-adaptive Lip-reading (2023)0.00