Graph-based Retriever Captures The Long Tail Of Biomedical Knowledge
2024 Β· Julien Delile, Srayanta Mukherjee, Anton van Pamel, et al.
Abstract
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented via natural language conversations. Yet, LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones. In the field of biomedical research, latest discoveries are key to academic and industrial actors and are obscured by the abundance of an ever-increasing literature corpus (the information overload problem). Surfacing new associations between biomedical entities, e.g., drugs, genes, diseases, with LLMs becomes a challenge of capturing the long-tail knowledge of the biomedical scientific production. To overcome this challenge, Retrieval Augmented Generation (RAG) has been proposed to alleviate some of the shortcomings of LLMs by augmenting the prompts with context retrieved from external datasets. RAG methods typically select the context via maximum similarity search over text embedd
Authors
(none)
Tags
Stats
Related papers
- Graph-aware Late Chunking For Retrieval-augmented Generation In Biomedical Literature (2026)0.00
- A Systematic Study Of Retrieval Pipeline Design For Retrieval-augmented Medical Question Answering (2026)0.00
- Hiperrag: High-performance Retrieval Augmented Generation For Scientific Insights (2025)6.34
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Improving Tool Retrieval By Leveraging Large Language Models For Query Generation (2024)0.00
- Hetarag: Hybrid Deep Retrieval-augmented Generation Across Heterogeneous Data Stores (2025)3.27
- Advancing Retrieval-augmented Generation For Structured Enterprise And Internal Data (2025)1.20