Hotelmatch-llm: Joint Multi-task Training Of Small And Large Language Models For Efficient Multimodal Hotel Retrieval
2025 Β· Arian Askari, Emmanouil Stergiadis, Ilya Gusev, et al.
Abstract
We present HotelMatch-LLM, a multimodal dense retrieval model for the travel domain that enables natural language property search, addressing the limitations of traditional travel search engines which require users to start with a destination and editing search parameters. HotelMatch-LLM features three key innovations: (1) Domain-specific multi-task optimization with three novel retrieval, visual, and language modeling objectives; (2) Asymmetrical dense retrieval architecture combining a small language model (SLM) for efficient online query processing and a large language model (LLM) for embedding hotel data; and (3) Extensive image processing to handle all property image galleries. Experiments on four diverse test sets show HotelMatch-LLM significantly outperforms state-of-the-art models, including VISTA and MARVEL. Specifically, on the test set -- main query type -- we achieve 0.681 for HotelMatch-LLM compared to 0.603 for the most effective baseline, MARVEL. Our analysis highlights
Authors
(none)
Tags
Stats
Related papers
- Indexing Multimodal Language Models For Large-scale Image Retrieval (2026)0.00
- Transforming Llms Into Cross-modal And Cross-lingual Retrieval Systems (2024)4.52
- RETLLM: Training And Data-free Mllms For Multimodal Information Retrieval (2026)1.57
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- Lamra: Large Multimodal Model As Your Advanced Retrieval Assistant (2024)7.50
- SLQ: Bridging Modalities Via Shared Latent Queries For Retrieval With Frozen Mllms (2026)0.00
- A Comparative Study Of Specialized Llms As Dense Retrievers (2025)2.26
- Beyond Global Similarity: Towards Fine-grained, Multi-condition Multimodal Retrieval (2026)2.20