Learning Separable Hidden Unit Contributions For Speaker-adaptive Lip-reading
2023 Β· Songtao Luo, Shuang Yang, Shiguang Shan, et al.
Abstract
In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to represent accurately. Therefore, we treat the shallow and deep layers differently for speaker adaptive lip reading. Secondly, we observe that a speaker's unique characteristics ( e.g. prominent oral cavity and mandible) have varied effects on lip reading performance for different words and pronunciations, necessitating adaptive enhancement or suppression of the features for robust lip reading. Based on these two observations, we propose to take advantage of the speaker's own characteristics to automatically learn separable hidden unit contributions with different targets for shallow layers and de
Authors
(none)
Tags
Stats
Related papers
- Target Speaker Lipreading By Audio-visual Self-distillation Pretraining And Speaker Adaptation (2025)5.24
- Learning Speaker-invariant Visual Features For Lipreading (2025)0.00
- Learning Hidden Unit Contributions For Unsupervised Acoustic Model Adaptation (2016)14.47
- Improving Speaker-independent Lipreading With Domain-adversarial Training (2017)10.85
- Multi-grained Spatio-temporal Modeling For Lip-reading (2019)0.00
- Lipformer: Learning To Lipread Unseen Speakers Based On Visual-landmark Transformers (2023)11.49
- Bayesian Learning For Deep Neural Network Adaptation (2020)9.76
- Simullr: Simultaneous Lip Reading Transducer With Attention-guided Adaptive Memory (2021)8.09