Magmar Shared Task System Description: Video Retrieval With Omniembed
2025 Β· Jiaqi Samantha Zhan, Crystina Zhang, Shengyao Zhuang, et al.
Abstract
Effective video retrieval remains challenging due to the complexity of integrating visual, auditory, and textual modalities. In this paper, we explore unified retrieval methods using OmniEmbed, a powerful multimodal embedding model from the Tevatron 2.0 toolkit, in the context of the MAGMaR shared task. Evaluated on the comprehensive MultiVENT 2.0 dataset, OmniEmbed generates unified embeddings for text, images, audio, and video, enabling robust multimodal retrieval. By finetuning OmniEmbed with the combined multimodal data--visual frames, audio tracks, and textual descriptions provided in MultiVENT 2.0, we achieve substantial improvements in complex, multilingual video retrieval tasks. Our submission achieved the highest score on the MAGMaR shared task leaderboard among public submissions as of May 20th, 2025, highlighting the practical effectiveness of our unified multimodal retrieval approach. Model checkpoint in this work is opensourced.
Authors
(none)
Tags
Stats
Related papers
- Tevatron 2.0: Unified Document Retrieval Toolkit Across Scale, Language, And Modality (2025)3.58
- MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion (2025)2.26
- Verve: Versatile Retrieval For Videos Via Unified Embeddings (2026)0.00
- Vlm2vec-v2: Advancing Multimodal Embedding For Videos, Images, And Visual Documents (2025)0.00
- Embedding-based Retrieval In Multimodal Content Moderation (2025)2.26
- Metaembed: Scaling Multimodal Retrieval At Test-time With Flexible Late Interaction (2025)2.35
- Clamr: Contextualized Late-interaction For Multimodal Content Retrieval (2025)0.00
- MARVEL: Unlocking The Multi-modal Capability Of Dense Retrieval Via Visual Module Plugin (2023)9.04