TRACE: Task-adaptive Reasoning And Representation Learning For Universal Multimodal Retrieval
2026 Β· Xiangzhao Hao, Shijie Wang, Tianyu Yang, et al.
Abstract
Universal Multimodal Retrieval requires unified embedding models capable of interpreting diverse user intents, ranging from simple keywords to complex compositional instructions. While Multimodal Large Language Models (MLLMs) possess strong reasoning capabilities, prevailing adaptations confine them to static encoders, underutilizing their generative potential. This encoder-only paradigm struggles with complex intents that demand logical deduction rather than superficial pattern matching. To address this, we introduce TRACE (Task-adaptive Reasoning And Compressing Embeddings). TRACE unifies generative reasoning with discriminative representation learning. It first generates a structured Chain-of-Thought (CoT) to explicitly reason about the query, and subsequently compresses this reasoning trace into a compact embedding via a dedicated token. To train this framework, we construct M-BEIR-CoT, a large-scale dataset featuring a difficulty-aware routing strategy. Experiments on the M-BEIR b
Authors
(none)
Tags
Stats
Related papers
- Reasoning-augmented Representations For Multimodal Retrieval (2026)0.00
- V-retrver: Evidence-driven Agentic Reasoning For Universal Multimodal Retrieval (2026)0.00
- Embed-rl: Reinforcement Learning For Reasoning-driven Multimodal Embeddings (2026)0.00
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- Reasoning Guided Embeddings: Leveraging MLLM Reasoning For Improved Multimodal Retrieval (2025)0.00
- MARVEL: Multimodal Adaptive Reasoning-intensive Expand-rerank And Retrieval (2026)0.00
- PLUME: Latent Reasoning Based Universal Multimodal Embedding (2026)0.00
- Recurrence Meets Transformers For Universal Multimodal Retrieval (2025)2.41