DiDeMo
Emerging6papers using it
2023first seen
Papers using DiDeMo (6)
- InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal UnderstandingBima: Towards Biases Mitigation For Text-video Retrieval Via Scene Element GuidanceDelving Deeper: Hierarchical Visual Perception for Robust Video-Text RetrievalFrom Captions To Keyframes: Keyscore For Multimodal Frame Scoring And Video-language UnderstandingGAIS: Frame-level Gated Audio-visual Integration With Semantic Variance-scaled Perturbation For Text-video RetrievalMug-STAN: Adapting Image-Language Pretrained Models for General Video
Understanding