Mixlm: High-throughput And Effective LLM Ranking Via Text-embedding Mix-interaction
2025 Β· Guoyao Li, Ran He, Shusen Jing, et al.
Abstract
Large language models (LLMs) excel at capturing semantic nuances and therefore show impressive relevance ranking performance in modern recommendation and search systems. However, they suffer from high computational overhead under industrial latency and throughput requirements. In particular, cross-encoder ranking systems often create long context prefill-heavy workloads, as the model has to be presented with the user, query and item information. To this end, we propose MixLM, a novel LLM-based ranking framework, which significantly improves the system throughput via reducing the input context length, while preserving the semantic strength of cross-encoder rankers. In contrast to a standard ranking system where the context is presented to the model as pure text, we propose to use mix-interaction, a mixture of text and embedding tokens to represent the input. Specifically, MixLM encodes all items in the catalog into a few embedding tokens and stores in a nearline cache. The encoded item
Authors
(none)
Tags
Stats
Related papers
- Bridging Language And Items For Retrieval And Recommendation: Benchmarking Llms As Semantic Encoders (2024)0.00
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- Magic-mm-embedding: Towards Visual-token-efficient Universal Multimodal Embedding With Mllms (2026)0.00
- Indexing Multimodal Language Models For Large-scale Image Retrieval (2026)0.00
- MLLM Is A Strong Reranker: Advancing Multimodal Retrieval-augmented Generation Via Knowledge-enhanced Reranking And Noise-injected Training (2024)9.18
- RETLLM: Training And Data-free Mllms For Multimodal Information Retrieval (2026)1.57
- Rethinking Hybrid Retrieval: When Small Embeddings And LLM Re-ranking Beat Bigger Models (2025)0.00
- MICE: Minimal Interaction Cross-encoders For Efficient Re-ranking (2026)0.00