Visual Gesture Variability Between Talkers In Continuous Visual Speech
2017 Β· Helen L Bear
Abstract
Recent adoption of deep learning methods to the field of machine lipreading research gives us two options to pursue to improve system performance. Either, we develop end-to-end systems holistically or, we experiment to further our understanding of the visual speech signal. The latter option is more difficult but this knowledge would enable researchers to both improve systems and apply the new knowledge to other domains such as speech therapy. One challenge in lipreading systems is the correct labeling of the classifiers. These labels map an estimated function between visemes on the lips and the phonemes uttered. Here we ask if such maps are speaker-dependent? Prior work investigated isolated word recognition from speaker-dependent (SD) visemes, we extend this to continuous speech. Benchmarked against SD results, and the isolated words performance, we test with RMAV dataset speakers and observe that with continuous speech, the trajectory between visemes has a greater negative effect on
Authors
(none)
Tags
Stats
Related papers
- Target Speaker Lipreading By Audio-visual Self-distillation Pretraining And Speaker Adaptation (2025)5.24
- Lip2vec: Efficient And Robust Visual Speech Recognition Via Latent-to-latent Visual To Audio Representation Mapping (2023)6.77
- Cross-modal Audio-visual Co-learning For Text-independent Speaker Verification (2023)9.23
- Lip-listening: Mixing Senses To Understand Lips Using Cross Modality Knowledge Distillation For Word-based Models (2022)0.00
- Towards Lipreading Sentences With Active Appearance Models (2018)8.82
- Lipvoicer: Generating Speech From Silent Videos Guided By Lip Reading (2023)3.89
- Lipger: Visually-conditioned Generative Error Correction For Robust Automatic Speech Recognition (2024)2.26
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65