Improving Speaker-independent Lipreading With Domain-adversarial Training
2017 Β· Michael Wand, Juergen Schmidhuber
Abstract
We present a Lipreading system, i.e. a speech recognition system using only visual features, which uses domain-adversarial training for speaker independence. Domain-adversarial training is integrated into the optimization of a lipreader based on a stack of feedforward and LSTM (Long Short-Term Memory) recurrent neural networks, yielding an end-to-end trainable system which only requires a very small number of frames of untranscribed target data to substantially improve the recognition accuracy on the target speaker. On pairs of different source and target speakers, we achieve a relative accuracy improvement of around 40% with only 15 to 20 seconds of untranscribed target speech data. On multi-speaker training setups, the accuracy improvements are smaller but still substantial.
Authors
(none)
Tags
Stats
Related papers
- Target Speaker Lipreading By Audio-visual Self-distillation Pretraining And Speaker Adaptation (2025)5.24
- Lipreading With Long Short-term Memory (2016)0.00
- Learning Separable Hidden Unit Contributions For Speaker-adaptive Lip-reading (2023)0.00
- Learning Speaker-invariant Visual Features For Lipreading (2025)0.00
- Speaker Verification Using End-to-end Adversarial Language Adaptation (2018)11.19
- Adversarial Training For Multi-domain Speaker Recognition (2020)6.77
- Spatio-temporal Attention Mechanism And Knowledge Distillation For Lip Reading (2021)0.00
- Self-supervised Learning Based Domain Adaptation For Robust Speaker Verification (2021)11.49