Lexsembridge: Fine-grained Dense Representation Enhancement Through Token-aware Embedding Augmentation
2025 Β· Shaoxiong Zhan, Hai Lin, Hongming Tan, et al.
Abstract
As queries in retrieval-augmented generation (RAG) pipelines powered by large language models (LLMs) become increasingly complex and diverse, dense retrieval models have demonstrated strong performance in semantic matching. Nevertheless, they often struggle with fine-grained retrieval tasks, where precise keyword alignment and span-level localization are required, even in cases with high lexical overlap that would intuitively suggest easier retrieval. To systematically evaluate this limitation, we introduce two targeted tasks, keyword retrieval and part-of-passage retrieval, designed to simulate practical fine-grained scenarios. Motivated by these observations, we propose LexSemBridge, a unified framework that enhances dense query representations through fine-grained, input-aware vector modulation. LexSemBridge constructs latent enhancement vectors from input tokens using three paradigms: Statistical (SLR), Learned (LLR), and Contextual (CLR), and integrates them with dense embeddings
Authors
(none)
Tags
Stats
Related papers
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- A Dense Representation Framework For Lexical And Semantic Matching (2022)11.13
- Large Reasoning Embedding Models: Towards Next-generation Dense Retrieval Paradigm (2025)0.00
- Complementing Lexical Retrieval With Semantic Residual Embedding (2020)13.50
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- QAEA-DR: A Unified Text Augmentation Framework For Dense Retrieval (2024)5.24
- Training Llms To Be Better Text Embedders Through Bidirectional Reconstruction (2025)0.00