Awesome Drug Discovery
Drug Discovery is one of the most active areas in Awesome AI for Science β 4,110 papers in this collection, evaluated on datasets like QM9, ChEMBL, GEOM-Drugs. A strong starting point is "AI-Driven Drug Discovery: A Comprehensive Review".
Datasets & benchmarks
Key papers
- AI-Driven Drug Discovery: A Comprehensive Review (2026)Garima Sharma17.37
- The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models (2025)Daniel S. Levine et al.11.16
- InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery (2026)Shiyang Feng et al.11.14
- BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology (2025)Ludovico Mitchener et al.10.71
- BERT Learns (and Teaches) Chemistry (2020)Josh Payne et al.10.07
- MACE-OFF: Transferable Short Range Machine Learning Force Fields for Organic Molecules (2023)D\'avid P\'eter Kov\'acs et al.10.02
- Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets (2022)Xingang Peng et al.10.00
- MassSpecGym: A benchmark for the discovery and identification of
molecules (2024)Roman Bushuiev et al.9.94
- SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration (2024)Joseph M. Cavanagh et al.9.86
- Digital materials ecosystem: from databases to AI agents for autonomous discovery (2026)Di Zhang et al.9.81
- Towards an AI co-scientist (2025)Juraj Gottweis et al.9.77
- DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively (2025)Yixuan Weng et al.9.55
- UMA: A Family of Universal Models for Atoms (2025)Brandon M. Wood et al.9.52
- Accurate RNA 3D structure prediction using a language model-based deep
learning approach (2022)Tao Shen et al.9.35
- Enhanced Sampling in the Age of Machine Learning: Algorithms and Applications (2025)Kai Zhu et al.9.09
- Graph Neural Networks in Modern AI-aided Drug Discovery (2025)Odin Zhang et al.8.75
- Equivariant Neural Diffusion for Molecule Generation (2025)Fran\c{c}ois Cornet and Grigory Bartosh and Mikkel N. Schmidt and Christian A. Naesseth8.75
- Machine Learning for De Novo Molecular Generation: A Comprehensive Review. (2026)Yingjun Chen et al.8.34
- Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design (2025)Tong Chen et al.8.29
- The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via
Agentic Tree Search (2025)Yutaro Yamada et al.8.23
- Learning-Order Autoregressive Models with Application to Molecular Graph Generation (2025)Zhe Wang et al.8.18
- PharmAgents: Building a Virtual Pharma with Large Language Model Agents (2025)Bowen Gao et al.8.07
- Multi-domain Distribution Learning for De Novo Drug Design (2025)Arne Schneuing et al.7.97
- Antiviral drug discovery and development: challenges and future directions (2026)Shaoqing Du et al.7.84
- Artificial Intelligence in Drug Discovery: Integrative Advances From Data to Therapeutic Innovation (2026)M. Mehran et al.7.84
- Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning (2025)Mateusz Praski et al.7.83
- AI-Powered Prediction of Nanoparticle Pharmacokinetics: A Multi-View
Learning Approach (2025)Amirhossein Khakpour et al.7.82
- FlowMol3: Flow Matching for 3D De Novo Small-Molecule Generation (2025)Ian Dunn et al.7.68
- RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks (2024)Rafael Josip Peni\'c et al.7.62
- System of Agentic AI for the Discovery of Metal-Organic Frameworks (2025)Theo Jaffrelot Inizan et al.7.61
- Amortized Sampling with Transferable Normalizing Flows (2025)Charlie B. Tan et al.7.52
- SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein
Degradation (2025)Fanglei Xue et al.7.50
- Large Language Models to Accelerate Organic Chemistry Synthesis (2025)Yu Zhang et al.7.46
- Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization (2025)Jiajun Yu et al.7.40
- drGT: Attention-Guided Gene Assessment of Drug Response Utilizing a Drug-Cell-Gene Heterogeneous Network (2024)Yoshitaka Inoue et al.7.38
- Computing solvation free energies of small molecules with experimental accuracy (2024)J. Harry Moore et al.7.38
- MDCrow: Automating Molecular Dynamics Workflows with Large Language
Models (2025)Quintina Campbell et al.7.35
- AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning (2025)Qile Jiang et al.7.33
- A Text-guided Protein Design Framework (2023)Shengchao Liu et al.7.32
- Advancing Molecular Machine Learning Representations with Stereoelectronics-Infused Molecular Graphs (2024)Daniil A. Boiko et al.7.30
- ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge (2025)Zihan Zhao et al.7.30
- Gumbel-Softmax Flow Matching with Straight-Through Guidance for
Controllable Biological Sequence Generation (2025)Sophia Tang et al.7.24
- Unified modeling of 3D molecular generation via atomic interactions with PocketXMol. (2026)Xingang Peng et al.7.24
- Self-Assembled Monolayers in p-i-n Perovskite Solar Cells: Molecular Design, Interfacial Engineering, and Machine Learning-Accelerated Material Discovery. (2026)Asmat Ullah et al.7.24
- Transformers in drug discovery: fine-tuning ChemBERTa for high-accuracy prediction of solubility, toxicity and binding affinity. (2026)S. Alagarsamy et al.7.24
- Multi-view biomedical foundation models for molecule-target and property prediction (2024)Parthasarathy Suryanarayanan et al.7.23
- Contextualizing biological perturbation experiments through language (2025)Menghua Wu et al.7.19
- Challenging reaction prediction models to generalize to novel chemistry (2025)John Bradshaw et al.7.13
- Representation Meets Optimization: Training PINNs and PIKANs for Gray-Box Discovery in Systems Pharmacology (2025)Nazanin Ahmadi Daryakenari et al.7.13
- Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces (2025)Henry B. Moss et al.7.11
- FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking
and Affinity Prediction (2024)Alex Morehead and Jianlin Cheng7.08
- Does Hessian Data Improve the Performance of Machine Learning Potentials? (2025)Austin Rodriguez and Justin S. Smith and Jose L. Mendoza-Cortes7.07
- JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data (2024)Apurva Kalia et al.7.02
- Systematic Analysis of Biomolecular Conformational Ensembles with PENSA (2022)Martin V\"ogele et al.7.01
- BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation
Experiments (2024)Yusuf Roohani et al.7.00
- FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand Generation (2025)Julian Cremer et al.6.95
- UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design (2025)Xiangzhe Kong et al.6.89
- AlphaEvolve: A coding agent for scientific and algorithmic discovery (2025)Alexander Novikov et al.6.88
- Re-evaluating sample efficiency in de novo molecule generation (2022)Morgan Thomas et al.6.78
- PoseBusters: AI-based docking methods fail to generate physically valid
poses or generalise to novel sequences (2023)Martin Buttenschoen et al.6.77