eess.IV
50 papers tagged eess.IV (ordered by heat_score)
Papers
- Audio-Visual Target Speaker Enhancement on Multi-Talker Environment
using Event-Driven Cameras (2021)Ander Arriandiaga et al.β
- Development and Evaluation of Video Recordings for the OLSA Matrix
Sentence Test (2021)Gerard Llorach et al.β
- An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation (2021)Daniel Michelsanti et al.β
- Audio-Visual Speech Inpainting with Deep Learning (2021)Giovanni Morrone et al.β
- An Empirical Study of Visual Features for DNN based Audio-Visual Speech
Enhancement in Multi-talker Environments (2022)Shrishti Saha Shetu et al.β
- VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency (2021)Ruohan Gao and Kristen Graumanβ
- Multi-layer Feature Fusion Convolution Network for Audio-visual Speech
Enhancement (2022)Xinmeng Xu and Jianjun Haoβ
- Learning Audio-Visual Correlations from Variational Cross-Modal
Generation (2021)Ye Zhu et al.β
- Active Audio-Visual Separation of Dynamic Sound Sources (2022)Sagnik Majumder and Kristen Graumanβ
- Learning Sound Localization Better From Semantically Similar Samples (2022)Arda Senocak et al.β
- Learning Contextually Fused Audio-visual Representations for
Audio-visual Speech Recognition (2022)Zi-Qiang Zhang et al.β
- VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
transfer from voice conversion (2022)Disong Wang et al.β
- Learning English with Peppa Pig (2023)Mitja Nikolaus and Afra Alishahi and Grzegorz Chrupa{\l}aβ
- Visually Supervised Speaker Detection and Localization via Microphone
Array (2022)Davide Berghi et al.β
- Deep CardioSound-An Ensembled Deep Learning Model for Heart Sound
MultiLabelling (2022)Li Guo et al.β
- The 2021 NIST Speaker Recognition Evaluation (2022)Seyed Omid Sadjadi and Craig Greenberg and Elliot Singer and Lisa Mason and Douglas Reynoldsβ
- Improving Multimodal Speech Recognition by Data Augmentation and Speech
Representations (2022)Dan Oneata et al.β
- VFHQ: A High-Quality Dataset and Benchmark for Video Face
Super-Resolution (2022)Liangbin Xie. Xintao Wang et al.β
- Perceptual Evaluation on Audio-visual Dataset of 360 Content (2022)Randy F Fela et al.β
- FlexLip: A Controllable Text-to-Lip System (2022)Dan Oneata et al.β
- Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos (2022)Alexander Waibel and Moritz Behr and Fevziye Irem Eyiokur and Dogucan Yaman and Tuan-Nam Nguyen and Carlos Mullov and Mehmet Arif Demirtas and Alperen Kantarc{\i} and Stefan Constantin and Haz{\i}m Kemal Ekenelβ
- Show Me Your Face, And I'll Tell You How You Speak (2022)Christen Millerdurai et al.β
- Graph-based Multi-View Fusion and Local Adaptation: Mitigating
Within-Household Confusability for Speaker Identification (2023)Long Chen et al.β
- Audio-Visual Segmentation (2023)Jinxing Zhou et al.β
- Visual Context-driven Audio Feature Enhancement for Robust End-to-End
Audio-Visual Speech Recognition (2022)Joanna Hong et al.β
- u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
to Unlabeled Modality (2022)Wei-Ning Hsu et al.β
- Speaker-adaptive Lip Reading with User-dependent Padding (2022)Minsu Kim et al.β
- StyleTalker: One-shot Style-based Audio-driven Talking Head Video
Generation (2024)Dongchan Min et al.β
- Prospectively accelerated dynamic speech MRI at 3 Tesla using a
self-navigated spiral based manifold regularized scheme (2023)Rushdi Zahid Rusho et al.β
- Unsupervised active speaker detection in media content using cross-modal
information (2022)Rahul Sharma and Shrikanth Narayananβ
- Multi-Source Transformer Architectures for Audiovisual Scene
Classification (2022)Wim Boes et al.β
- Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using
Permutation-Free Loss Function (2022)Qing Wang et al.β
- SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via
Audio-Lip Memory (2022)Se Jin Park et al.β
- Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture
Videos into Multiple Indian Languages (2022)Anusha Prakash et al.β
- MarginNCE: Robust Sound Localization with a Negative Margin (2022)Sooyoung Park et al.β
- AVATAR submission to the Ego4D AV Transcription Challenge (2022)Paul Hongsuck Seo et al.β
- DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion
Model (2023)Fan Zhang et al.β
- Synthesizing audio from tongue motion during speech using tagged MRI via
transformer (2023)Xiaofeng Liu et al.β
- Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech
Recognition (2024)Minsu Kim et al.β
- Cross-modal Audio-visual Co-learning for Text-independent Speaker
Verification (2023)Meng Liu et al.β
- UniFLG: Unified Facial Landmark Generator from Text or Speech (2023)Kentaro Mitsui et al.β
- Improving Medical Speech-to-Text Accuracy with Vision-Language
Pre-training Model (2023)Jaeyoung Huh et al.β
- SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using
Deep Neural Networks (2023)Naoki Kimura et al.β
- WASD: A Wilder Active Speaker Detection Dataset (2023)Tiago Roxo et al.β
- ModEFormer: Modality-Preserving Embedding for Audio-Video
Synchronization using Transformers (2023)Akash Gupta et al.β
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading
Expert (2023)Jiadong Wang et al.β
- Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment (2023)Kim Sung-Bin et al.β
- Deep sound-field denoiser: optically-measured sound-field denoising
using deep neural network (2023)Kenji Ishikawa et al.β
- Towards Ultrasound Tongue Image prediction from EEG during speech
production (2023)Tam\'as G\'abor Csap\'o et al.β
- AKVSR: Audio Knowledge Empowered Visual Speech Recognition by
Compressing Audio Knowledge of a Pretrained Model (2024)Jeong Hun Yeo et al.β