Lipreading With 3D-2D-CNN BLSTM-HMM And Word-ctc Models
2019 Β· Dilip Kumar Margam, Rohith Aralikatti, Tanay Sharma, et al.
Abstract
In recent years, deep learning based machine lipreading has gained prominence. To this end, several architectures such as LipNet, LCANet and others have been proposed which perform extremely well compared to traditional lipreading DNN-HMM hybrid systems trained on DCT features. In this work, we propose a simpler architecture of 3D-2D-CNN-BLSTM network with a bottleneck layer. We also present analysis of two different approaches for lipreading on this architecture. In the first approach, 3D-2D-CNN-BLSTM network is trained with CTC loss on characters (ch-CTC). Then BLSTM-HMM model is trained on bottleneck lip features (extracted from 3D-2D-CNN-BLSTM ch-CTC network) in a traditional ASR training pipeline. In the second approach, same 3D-2D-CNN-BLSTM network is trained with CTC loss on word labels (w-CTC). The first approach shows that bottleneck features perform better compared to DCT features. Using the second approach on Grid corpus' seen speaker test set, we report \(1.3%\) WER - a \(5
Authors
(none)
Tags
Stats
Related papers
- Lipreading Using Temporal Convolutional Networks (2020)17.61
- Can Dnns Learn To Lipread Full Sentences? (2018)6.77
- Lipreading With Long Short-term Memory (2016)0.00
- Towards Lipreading Sentences With Active Appearance Models (2018)8.82
- Multi-grained Spatio-temporal Modeling For Lip-reading (2019)0.00
- Spatio-temporal Attention Mechanism And Knowledge Distillation For Lip Reading (2021)0.00
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Learning Separable Hidden Unit Contributions For Speaker-adaptive Lip-reading (2023)0.00