Multimodal Neural Databases
2023 · Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, et al.
Abstract
The rise in loosely-structured data available through text, images, and other modalities has called for new ways of querying them. Multimedia Information Retrieval has filled this gap and has witnessed exciting progress in recent years. Tasks such as search and retrieval of extensive multimedia archives have undergone massive performance improvements, driven to a large extent by recent developments in multimodal deep learning. However, methods in this field remain limited in the kinds of queries they support and, in particular, their inability to answer database-like queries. For this reason, inspired by recent work on neural databases, we propose a new framework, which we name Multimodal Neural Databases (MMNDBs). MMNDBs can answer complex database-like queries that involve reasoning over different input modalities, such as text and images, at scale. In this paper, we present the first architecture able to fulfill this set of requirements and test it with several baselines, showing th
Authors
(none)
Tags
Stats
Related papers
- MM-BRIGHT: A Multi-task Multimodal Benchmark For Reasoning-intensive Retrieval (2026)2.60
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- Docmmir: A Framework For Document Multi-modal Information Retrieval (2025)3.46
- Needle: A Generative Ai-powered Multi-modal Database For Answering Complex Natural Language Queries (2024)0.00
- IDMR: Towards Instance-driven Precise Visual Correspondence In Multimodal Retrieval (2025)2.29
- Mumur : Multilingual Multimodal Universal Retrieval (2022)2.26
- Composed Multi-modal Retrieval: A Survey Of Approaches And Applications (2025)3.88
- MUST: An Effective And Scalable Framework For Multimodal Search Of Target Modality (2023)7.81