MSVD
Emerging6papers using it
2022first seen
The MSVD dataset contains a collection of video clips paired with corresponding textual descriptions, and it is used to evaluate text-video retrieval models by measuring their ability to rank relevant text-video pairs.
Papers using MSVD (6)
- Bima: Towards Biases Mitigation For Text-video Retrieval Via Scene Element GuidanceFrom Captions To Keyframes: Keyscore For Multimodal Frame Scoring And Video-language UnderstandingX-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalSupport-set based Multi-modal Representation Enhancement for Video
CaptioningMuMUR : Multilingual Multimodal Universal RetrievalTagging before Alignment: Integrating Multi-Modal Tags for Video-Text
Retrieval