Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models And Vision Language Models
2024 Β· Hongyi Zhu, Jia-Hong Huang, Stevan Rudinac, et al.
Abstract
Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by accepting textual or visual queries, retrieving the top-relevant candidate results from the database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies and limited recall. These methods also face the challenges, such as vocabulary mismatch and the semantic gap, constraining their overall effectiveness. To address these issues, we propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates a vision language model (VLM) based image captioner to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a large language model (LLM) based denoiser to refine text-based
Authors
(none)
Tags
Stats
Related papers
- A Little More Like This: Text-to-image Retrieval With Vision-language Models Using Relevance Feedback (2025)0.00
- Recqr: Incorporating Conversational Query Rewriting To Improve Multimodal Image Retrieval (2026)0.00
- Interactive Text-to-image Retrieval With Large Language Models: A Plug-and-play Approach (2024)10.24
- Seeing Through Words: Controlling Visual Retrieval Quality With Language Models (2026)3.80
- Indexing Multimodal Language Models For Large-scale Image Retrieval (2026)0.00
- Leveraging Large Vision-language Model As User Intent-aware Encoder For Composed Image Retrieval (2024)3.58
- An Interactive Multi-modal Query Answering System With Retrieval-augmented Large Language Models (2024)5.84
- Evdclip: Improving Vision-language Retrieval With Entity Visual Descriptions From Large Language Models (2025)0.00