Awesome Information Retrieval
Information Retrieval is one of the most active areas in Awesome AI Agents — 60 papers in this collection, evaluated on datasets like 10K verifiable QA pairs, 50K websites, MLQA. A strong starting point is "La representaci\'on de la variaci\'on contextual mediante definiciones terminol\'ogicas flexibles".
Datasets & benchmarks
Key papers
- La representaci\'on de la variaci\'on contextual mediante definiciones terminol\'ogicas flexibles (2016)Antonio San Mart\'in6.34
- Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search (2026)Sidhaarth Murali et al.6.23
- Trait, Not State: The Durability of Reading Identity in Social Highlighting (2026)Kazuki Nakayashiki et al.5.89
- M3: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis (2025)Rafi Al Attrach et al.5.72
- Context-aware Entity-Relation Extraction for Threat Intelligence Knowledge Graphs (2026)Inoussa Mouiche et al.5.58
- Cartridges at Scale: Training Modular KV Caches over Large Document Collections (2026)Momchil Hardalov et al.5.49
- How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation (2026)Chase M. Fensore et al.5.49
- On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies (2026)Sunwoo Kim et al.5.49
- Temporal Preference Optimization for Unsupervised Retrieval (2026)HyunJin Kim et al.5.49
- Non-negative Elastic Net Decoding for Information Retrieval (2026)Koki Okajima et al.5.49
- A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation (2026)Haoyang Zhong et al.5.49
- RankGraph-2: Lifecycle Co-Design for Billion-Node Graph Learning in Recommendation (2026)Renzhi Wu et al.5.49
- Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation (2026)Shanshan Lyu et al.5.49
- MetaConfigurator: AI-Assisted RDF Authoring from JSON Data (2026)Felix Neubauer et al.5.01
- Doc-to-Atom: Learning to Compile and Compose Memory Atoms (2026)Xingjian Diao et al.5.01
- IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction (2026)Henry Bodwell et al.5.01
- Beyond Text and Tables: Vision-Language Model Integration in ComProScanner for Extracting Materials Data from Scientific Figures with High Accuracy (2026)Aritra Roy et al.4.39
- Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts (2026)Xu Li et al.4.39
- A PubMed-Scale Dataset of Structured Biomedical Abstracts (2026)Chia-Hsuan Chang et al.4.39
- Cross-Dataset Bloom Question Classification: Supervised Models and Prompted LLMs (2026)Abdolali Faraji et al.4.39
- Hyperdimensional computing for structured querying on tabular data embeddings (2026)Sebasti\'an Bugedo et al.4.39
- Context-aware Modality-Topology Co-Alignment for Multimodal Attributed Graphs (2026)Sirui Zhang et al.4.39
- Context Compression Is Not One Thing: Readable Symbolic Re-expression vs. Coherent Summary at Matched Budget (2026)Sisong Bei et al.4.39
- Semantics-Enhanced Retrieval-Augmented Time Series Forecasting (2026)Shiqiao Zhou et al.4.39
- Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains (2026)Haniye Sherafatmandjoo et al.4.39
- Provenance-Enhanced Statements in Knowledge Graphs (2026)Fabio Vitali et al.4.39
- Guiding Federated Graph Recommendation with LLM-encoded knowledge (2026)Thi Minh Chau Nguyen et al.4.39
- Few-Shot Biomedical Relation Extraction with Large Language Models: A Viable Alternative to Supervised Learning? (2026)Jakob Mraz et al.4.39
- Encode Errors: Representational Retrieval of In-Context Demonstrations for Multilingual Grammatical Error Correction (2026)Guangyue Peng et al.4.39
- Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranker to Query-Answer Tasks by Discrete Ricci Flow (2026)Tian Qin et al.4.39
- AthDGC: An Open Diachronic Greek Treebank with Indo-European Parallels (2026)Nikolaos Lavidas et al.4.39
- Overcoming the Impedance Mismatch: A Theoretical Roadmap for Fusing Foundation Models and Knowledge Graphs (2026)Sahil Rajesh Dhayalkar4.39
- Vernier: Probing Representational Misalignment Behind Lexical Gaps in Causal Reasoning (2026)Zhenyu Yu4.39
- A Self Consistency Based Reranking for Narrative Question Answering (2026)Molham Mohamed et al.4.39
- Entity Labels Are Not Entity Signals: A Framework for Observable Relevance in Document Re-Ranking (2026)Utshab Kumar Ghosh et al.4.39
- IBAD: Interpretable Behavioral Anomaly Detection on Human Mobility Data (2026)Bita Azarijoo et al.4.39
- SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG (2026)Nathana\"el Langlois4.39
- RAID: Semantic Graph Diffusion for True Cold-Start and Cross-Lingual Forecasting (2026)Arunkumar V et al.4.39
- How Much Do Reviews Really Contribute? A Study on Text-Enriched Matrix Factorization for Recommendations (2026)Eduardo Ferreira da Silva et al.4.39
- Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation (2026)Jan Cegin et al.4.39
- BCL: Bayesian In-Context Learning Framework for Information Extraction (2026)Haoliang Liu et al.4.39
- SHIFT: Semantic Harmonization via Index-side Feature Transformation for Multilingual Information Retrieval (2026)Youngjoon Jang et al.4.39
- ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement (2026)Bohou Zhang et al.4.39
- Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining (2026)Wicaksono Leksono Muhamad et al.4.39
- Approximate Structured Diffusion for Sequence Labelling (2026)Nicolas Floquet et al.4.39
- Efficient Financial Language Understanding via Distillation with Synthetic Data (2026)Wen-Fong (Xavier) et al.4.39
- Improving Medical Communication using Rubric-Guided Counterfactual Recommendations (2026)Adrian Cosma et al.4.39
- Learning Robust Pair Confidence for Multimodal Emotion-Cause Pair Extraction (2026)Zhuangzhuang Pan et al.4.39
- SAERec: Constructing Fine-grained Interpretable Intents Priors via Sparse Autoencoders for Recommendation (2026)Jiangnan Xia et al.4.39
- Zero-Shot Active Feature Acquisition via LLM-Elicitation (2026)Binyamin Perets et al.4.39
- Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering (2026)Yafeng Wu et al.4.39
- JourneyFormer: Encoding Airbnb Guest Journey with Sequence Modeling (2026)Daochen Zha et al.4.39
- The More the Merrier: Combining Properties for ABox Abduction under Repair Semantics for ELbot (2026)Anselm Haak et al.4.39
- LECTOR: Joint Optimization of Scientific Reasoning Graphs and Introduction Generation (2026)Jiabei Xiao et al.4.33
- Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning (2026)Bernardo A. Denkvitts et al.4.33
- SETUP: Sentence-level English-To-Uniform Meaning Representation Parser (2026)Emma Markle et al.4.26
- DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering (2025)Rong Cheng et al.3.75
- Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training (2026)Jo\~ao Coelho et al.3.51
- Benchmarking Large Language Models for Safety Data Extraction (2026)Jonas Grill et al.3.51
- The Culture Funnel: You Can't Align What isn't in the Data (2026)Ananya Sahu et al.3.51