LRS-2
Emerging16papers using it
2022first seen
LRS2 is a dataset used to evaluate audio-visual speech recognition systems, containing video recordings of people speaking, which helps assess the model's performance in recognizing speech from visual cues.
Papers using LRS-2 (15)
- Litevsr: Efficient Visual Speech Recognition By Learning From Speech Representations Of Unlabeled DataDiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker
Characteristics And IntelligibilityDubwise: Video-guided Speech Duration Control In Multimodal Llm-based Text-to-speech For DubbingVisG AV-HuBERT: Viseme-Guided AV-HuBERTPay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech RecognitionReading to Listen at the Cocktail Party: Multi-Modal Speech SeparationCross-Modal Global Interaction and Local Alignment for Audio-Visual
Speech RecognitionLipVoicer: Generating Speech from Silent Videos Guided by Lip ReadingDubWise: Video-Guided Speech Duration Control in Multimodal LLM-based
Text-to-Speech for DubbingVisual Context-driven Audio Feature Enhancement for Robust End-to-End
Audio-Visual Speech RecognitionLip-to-Speech Synthesis in the Wild with Multi-task LearningAuto-AVSR: Audio-Visual Speech Recognition with Automatic LabelsOpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality
AlignmentLiteVSR: Efficient Visual Speech Recognition by Learning from Speech
Representations of Unlabeled DataWhisper-Flamingo: Integrating Visual Features into Whisper for
Audio-Visual Speech Recognition and Translation