LUSIFER: Language Universal Space Integration For Enhanced Multilingual Embeddings With Large Language Models
2025 Β· Hieu Man, Nghia Trung Ngo, Viet Dac Lai, et al.
Abstract
Recent advancements in large language models (LLMs) based embedding models have established new state-of-the-art benchmarks for text embedding tasks, particularly in dense vector-based retrieval. However, these models predominantly focus on English, leaving multilingual embedding capabilities largely unexplored. To address this limitation, we present LUSIFER, a novel zero-shot approach that adapts LLM-based embedding models for multilingual tasks without requiring multilingual supervision. LUSIFER's architecture combines a multilingual encoder, serving as a language-universal learner, with an LLM-based embedding model optimized for embedding-specific tasks. These components are seamlessly integrated through a minimal set of trainable parameters that act as a connector, effectively transferring the multilingual encoder's language understanding capabilities to the specialized embedding model. Additionally, to comprehensively evaluate multilingual embedding performance, we introduce a new
Authors
(none)
Tags
Stats
Related papers
- Transforming Llms Into Cross-modal And Cross-lingual Retrieval Systems (2024)4.52
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- LEMUR: A Corpus For Robust Fine-tuning Of Multilingual Law Embedding Models For Retrieval (2026)0.00
- Lightretriever: A Llm-based Text Retrieval Architecture With Extremely Faster Query Inference (2025)0.00
- Breaking The Modality Barrier: Universal Embedding Learning With Multimodal Llms (2025)4.52
- Massively Multilingual Sentence Embeddings For Zero-shot Cross-lingual Transfer And Beyond (2018)26.33
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- Magic-mm-embedding: Towards Visual-token-efficient Universal Multimodal Embedding With Mllms (2026)0.00