Bridging Language And Items For Retrieval And Recommendation: Benchmarking Llms As Semantic Encoders
2024 Β· Yupeng Hou, Jiacheng Li, Xiangjun Fu, et al.
Abstract
Feature engineering has long been central to recommender systems, yet effectively leveraging textual item features remains challenging. Recent advances in large language models (LLMs) have enabled their use as semantic encoders for recommendation, but their roles and behaviors in this setting are still not well understood. Prior studies often rely on general-purpose embedding benchmarks (e.g., MTEB) when selecting LLMs, overlooking the unique characteristics of recommendation tasks. To address this gap, we introduce BLaIR, a comprehensive benchmark for evaluating LLMs as semantic encoders in recommendation scenarios. We contribute (1) a new large-scale Amazon Reviews 2023 dataset with over 570 million reviews and 48 million items, (2) a unified benchmark covering sequential recommendation, collaborative filtering, and product search, and (3) a new complex-query product search task featuring both semi-synthetic and real-world evaluation datasets. Experiments with 11 leading LLMs show th
Authors
(none)
Tags
Stats
Related papers
- Mixlm: High-throughput And Effective LLM Ranking Via Text-embedding Mix-interaction (2025)0.00
- STAR: A Simple Training-free Approach For Recommendations Using Large Language Models (2024)0.00
- PLUM: Adapting Pre-trained Language Models For Industrial-scale Generative Recommendations (2025)2.26
- Large Reasoning Embedding Models: Towards Next-generation Dense Retrieval Paradigm (2025)0.00
- Domain-adaptive And Scalable Dense Retrieval For Content-based Recommendation (2026)0.00
- Vlm4rec: Multimodal Semantic Representation For Recommendation With Large Vision-language Models (2026)1.82
- Notellm: A Retrievable Large Language Model For Note Recommendation (2024)9.41
- Advancing Large Language Models For Spatiotemporal And Semantic Association Mining Of Similar Environmental Events (2024)5.84