You Only Use Reactive Attention Slice For Long Context Retrieval
2024 Β· Yun Joon Soh, Hanxian Huang, Yuandong Tian, et al.
Abstract
Supporting longer context for Large Language Models (LLM) is a promising direction to advance LLMs. As training a model for a longer context window is computationally expensive, many alternative solutions, such as Retrieval Augmented Generation (RAG), have been used. However, most existing RAG methods adopt embedding-based retrieval that falls short on long contexts. To address such challenges, we propose an attention-based retrieval technique, You Only Use Reactive Attention slice (YOURA). YOURA leverages a novel retrieval heuristic called reaction score to rank the relevance of each sentence in the input context with the query sentence. Intuitively, we measure how the per-token attention score "reacts" to the query and greedily retrieves the most reactive sentences. Internally, YOURA generates a token-indexed vector (called reaction vector) for the whole input context. To map each sentence to the token-indexed vector, we propose an Embedding-Agnostic Sentence Yield (EASY), a best-e
Authors
(none)
Tags
Stats
Related papers
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Retrievalattention: Accelerating Long-context LLM Inference Via Vector Retrieval (2024)0.00
- Efficient Context Selection For Long-context QA: No Tuning, No Iteration, Just Adaptive-\(k\) (2025)2.26
- Slimrag: Retrieval Without Graphs Via Entity-aware Context Selection (2025)1.91
- Optimizing Retrieval For RAG Via Reinforcement Learning (2025)0.00
- Multi-head RAG: Solving Multi-aspect Problems With Llms (2024)0.00
- Rare: Retrieval Augmented Retrieval With In-context Examples (2024)0.00
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57