Awesome Information Retrieval
Information Retrieval is one of the most active areas in Awesome AI Agents β 60 papers in this collection, evaluated on datasets like SQuAD, MultiHop-RAG, DOCCI. A strong starting point is "FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining".
Datasets & benchmarks
Key papers
- FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining (2026)Jinghong Lan et al.12.10
- La representaci\'on de la variaci\'on contextual mediante definiciones terminol\'ogicas flexibles (2016)Antonio San Mart\'in6.34
- M3: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis (2025)Rafi Al Attrach et al.5.72
- How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation (2026)Chase M. Fensore et al.5.49
- Token Factory: Efficiently Integrating Diverse Signals into Large Recommendation Models (2026)Xilun Chen et al.5.49
- Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation (2026)Ruizhong Qiu et al.5.49
- MetaConfigurator: AI-Assisted RDF Authoring from JSON Data (2026)Felix Neubauer et al.5.01
- Doc-to-Atom: Learning to Compile and Compose Memory Atoms (2026)Xingjian Diao et al.5.01
- Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts (2026)Zhyar Rzgar K. Rostam et al.5.01
- DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence (2026)DeepSeek-AI et al.5.01
- Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics (2026)Zhengheng Li et al.5.01
- Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning (2026)Xinyan Zhu et al.5.01
- A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization (2026)Vu Nguyen Nguyen Xuan et al.5.01
- VCG: A Multimodal Retrieval Framework for E-Commerce Video Feeds under Extreme Cold-Start Conditions (2026)Katya Mirylenka et al.5.01
- FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs (2026)Elijah Feldman et al.5.01
- Confidence Calibration for Multimodal LLMs: An Empirical Study through Medical VQA (2026)Yuetian Du et al.5.01
- ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval (2026)Yuhan Liu et al.5.01
- Beyond Text and Tables: Vision-Language Model Integration in ComProScanner for Extracting Materials Data from Scientific Figures with High Accuracy (2026)Aritra Roy et al.4.39
- Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts (2026)Xu Li et al.4.39
- Cross-Dataset Bloom Question Classification: Supervised Models and Prompted LLMs (2026)Abdolali Faraji et al.4.39
- Hyperdimensional computing for structured querying on tabular data embeddings (2026)Sebasti\'an Bugedo et al.4.39
- Context-aware Modality-Topology Co-Alignment for Multimodal Attributed Graphs (2026)Sirui Zhang et al.4.39
- Semantics-Enhanced Retrieval-Augmented Time Series Forecasting (2026)Shiqiao Zhou et al.4.39
- Few-Shot Biomedical Relation Extraction with Large Language Models: A Viable Alternative to Supervised Learning? (2026)Jakob Mraz et al.4.39
- Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranker to Query-Answer Tasks by Discrete Ricci Flow (2026)Tian Qin et al.4.39
- AthDGC: An Open Diachronic Greek Treebank with Indo-European Parallels (2026)Nikolaos Lavidas et al.4.39
- Vernier: Probing Representational Misalignment Behind Lexical Gaps in Causal Reasoning (2026)Zhenyu Yu4.39
- A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation (2026)Haoyang Zhong et al.4.39
- LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data (2026)Akshat Dasula et al.4.39
- Toten: Knowledge-Based Ontological Tokenization Of Physical Quantities And Technical Notation In Brazilian Portuguese (2026)Antonio de Sousa Leit\~ao Filho et al.4.39
- GLARE: A Natural Language Interface for Querying Global Explanations (2026)Bhavan Vasu et al.4.39
- Implicit Semantic-Aware Communication Based on Hypergraph Reasoning (2026)Yiwei Liao et al.4.39
- LECTOR: Joint Optimization of Scientific Reasoning Graphs and Introduction Generation (2026)Jiabei Xiao et al.4.33
- Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning (2026)Bernardo A. Denkvitts et al.4.33
- DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering (2025)Rong Cheng et al.3.75
- SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation (2026)Hyunwoong Kim et al.3.51
- Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training (2026)Jo\~ao Coelho et al.3.51
- IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction (2026)Henry Bodwell et al.3.51
- Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting (2026)Amritansh Maurya et al.3.45
- Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG (2026)Yubo Li et al.3.45
- Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory (2026)Youwang Deng3.45
- Cost-Efficient RAG for Entity Matching with LLMs: A Blocking-based Exploration (2026)Chuangtao Ma et al.3.27
- An Entity Linking Agent for Question Answering (2025)Yajie Luo et al.3.04
- Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding (2025)Sakhinana Sagar Srinivas et al.2.28
- LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion (2026)Yunbo Long et al.2.00
- QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation (2026)Dehai Min et al.2.00
- Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning (2026)Chengwen Liu et al.2.00
- EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models (2026)Jincheng Xie et al.2.00
- Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction (2026)A H M Rezaul Karim et al.2.00
- ScreenSearch: Uncertainty-Aware OS Exploration (2026)Michael Solodko et al.2.00
- LERA: LLM-Enhanced RAG for Ad Auction in Generative Chatbots (2026)Haoran Sun et al.2.00
- LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems (2026)Mert Coskuner et al.2.00
- SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning (2026)Yufei Ma et al.2.00
- Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and Euclid (2026)Aritra Ghosh et al.2.00
- MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry (2026)Joey Chan et al.2.00
- DeMix: Debugging Training Data with Mixed Data Error Types by Investigating Influence Vectors (2026)Jiale Deng et al.2.00
- Retrieval-augmented Reasoning For Chartered Accountancy (2026)Jatin Gupta, Akhil Sharma, Saransh Singhania, et al.2.00
- Large Language Model Chatbot with Retrievalaugmented Generation and Function Calling for Indonesian Article 21 Withholding Tax Support (2026)Verren Angelina Saputra et al.2.00
- Webaggregator: Enhancing Compositional Reasoning Capabilities Of Deep Research Agent Foundation Models (2026)Rui Wang, Ce Zhang, Jun-Yu Ma, et al.2.00
- Self-reinforcing Controllable Synthesis Of Rare Relational Data Via Bayesian Calibration (2026)Chongsheng Zhang, Hao Wang, Zelong Yu, et al.2.00