Recqr: Incorporating Conversational Query Rewriting To Improve Multimodal Image Retrieval
2026 Β· Yuan Hu, Zhiyu Cao, Peifeng Li, et al.
Abstract
With the rise of multimodal learning, image retrieval plays a crucial role in connecting visual information with natural language queries. Existing image retrievers struggle with processing long texts and handling unclear user expressions. To address these issues, we introduce the conversational query rewriting (CQR) task into the image retrieval domain and construct a dedicated multi-turn dialogue query rewriting dataset. Built on full dialogue histories, CQR rewrites users' final queries into concise, semantically complete ones that are better suited for retrieval. Specifically, We first leverage Large Language Models (LLMs) to generate rewritten candidates at scale and employ an LLM-as-Judge mechanism combined with manual review to curate approximately 7,000 high-quality multimodal dialogues, forming the ReCQR dataset. Then We benchmark several SOTA multimodal models on the ReCQR dataset to assess their performance on image retrieval. Experimental results demonstrate that CQR not on
Authors
(none)
Tags
Stats
Related papers
- Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models And Vision Language Models (2024)8.82
- End-to-end Knowledge Retrieval With Multi-modal Queries (2023)8.35
- Mcot-re: Multi-faceted Chain-of-thought And Re-ranking For Training-free Zero-shot Composed Image Retrieval (2025)0.00
- Chatsearch: A Dataset And A Generative Retrieval Model For General Conversational Image Retrieval (2024)2.00
- Flickr30k-cfq: A Compact And Fragmented Query Dataset For Text-image Retrieval (2024)3.58
- Chatting Makes Perfect: Chat-based Image Retrieval (2023)5.29
- Recall: Recalibrating Capability Degradation For Mllm-based Composed Image Retrieval (2026)2.90
- Chain-of-thought Re-ranking For Image Retrieval Tasks (2025)1.81