Cico: Domain-aware Sign Language Retrieval Via Cross-lingual Contrastive Learning
2023 Β· Yiting Cheng, Fangyun Wei, Jianmin Bao, et al.
Abstract
This work focuses on sign language retrieval-a recently proposed task for sign language understanding. Sign language retrieval consists of two sub-tasks: text-to-sign-video (T2V) retrieval and sign-video-to-text (V2T) retrieval. Different from traditional video-text retrieval, sign language videos, not only contain visual signals but also carry abundant semantic meanings by themselves due to the fact that sign languages are also natural languages. Considering this character, we formulate sign language retrieval as a cross-lingual retrieval problem as well as a video-text retrieval task. Concretely, we take into account the linguistic properties of both sign languages and natural languages, and simultaneously identify the fine-grained cross-lingual (i.e., sign-to-word) mappings while contrasting the texts and the sign videos in a joint embedding space. This process is termed as cross-lingual contrastive learning. Another challenge is raised by the data scarcity issue-sign language datas
Authors
(none)
Tags
Stats
Related papers
- Sign Language Video Retrieval With Free-form Textual Queries (2022)10.35
- A Tale Of Two Languages: Large-vocabulary Continuous Sign Language Recognition From Spoken Language Supervision (2024)0.00
- Lat: Latent Translation With Cycle-consistency For Video-text Retrieval (2022)0.00
- Covlr: Coordinating Cross-modal Consistency And Intra-modal Structure For Vision-language Retrieval (2023)4.52
- X-CLIP: End-to-end Multi-grained Contrastive Learning For Video-text Retrieval (2022)18.12
- Normalized Contrastive Learning For Text-video Retrieval (2022)6.77
- Clip2video: Mastering Video-text Retrieval Via Image CLIP (2021)0.00
- COTS: Collaborative Two-stream Vision-language Pre-training Model For Cross-modal Retrieval (2022)13.60