Awesome Protein Science
Protein Science is one of the most active areas in Awesome AI for Science β 2,059 papers in this collection, evaluated on datasets like ProteinGym, Protein Data Bank (PDB), Protein Data Bank. A strong starting point is "MACE-OFF: Transferable Short Range Machine Learning Force Fields for Organic Molecules".
Datasets & benchmarks
Key papers
- MACE-OFF: Transferable Short Range Machine Learning Force Fields for Organic Molecules (2023)D\'avid P\'eter Kov\'acs et al.10.02
- Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets (2022)Xingang Peng et al.10.00
- Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems (2023)Xuan Zhang et al.9.29
- Metagenomic-scale analysis of the predicted protein structure universe (2026)Jingi Yeo et al.9.16
- Enhanced Sampling in the Age of Machine Learning: Algorithms and Applications (2025)Kai Zhu et al.9.09
- Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design (2025)Tong Chen et al.8.29
- Multi-domain Distribution Learning for De Novo Drug Design (2025)Arne Schneuing et al.7.97
- HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data (2025)Hiren Madhu et al.7.70
- Machine Learning and Data-Driven Methods in Computational Surface and Interface Science (2025)Lukas H\"ormann et al.7.55
- Amortized Sampling with Transferable Normalizing Flows (2025)Charlie B. Tan et al.7.52
- SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein
Degradation (2025)Fanglei Xue et al.7.50
- MDCrow: Automating Molecular Dynamics Workflows with Large Language
Models (2025)Quintina Campbell et al.7.35
- A Text-guided Protein Design Framework (2023)Shengchao Liu et al.7.32
- Advancing Molecular Machine Learning Representations with Stereoelectronics-Infused Molecular Graphs (2024)Daniil A. Boiko et al.7.30
- Gumbel-Softmax Flow Matching with Straight-Through Guidance for
Controllable Biological Sequence Generation (2025)Sophia Tang et al.7.24
- Unified modeling of 3D molecular generation via atomic interactions with PocketXMol. (2026)Xingang Peng et al.7.24
- FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking
and Affinity Prediction (2024)Alex Morehead and Jianlin Cheng7.08
- Systematic Analysis of Biomolecular Conformational Ensembles with PENSA (2022)Martin V\"ogele et al.7.01
- Materials Graph Library (MatGL), an open-source graph deep learning
library for materials science and chemistry (2025)Tsz Wai Ko et al.6.89
- UniMoMo: Unified Generative Modeling of 3D Molecules for De Novo Binder Design (2025)Xiangzhe Kong et al.6.89
- Physics-Informed Machine Learning in Biomedical Science and Engineering (2025)Nazanin Ahmadi et al.6.86
- PoseBusters: AI-based docking methods fail to generate physically valid
poses or generalise to novel sequences (2023)Martin Buttenschoen et al.6.77
- Understanding protein function with a multimodal retrieval-augmented foundation model (2025)Timothy Fei Truong Jr et al.6.75
- Autofocused oracles for model-based design (2020)Clara Fannjiang and Jennifer Listgarten6.66
- MatterChat: A Multi-Modal LLM for Material Science (2025)Yingheng Tang et al.6.63
- Machine learning, docking, or physics for structure prediction of ligand-induced ternary complexes. (2026)Riccardo Solazzo et al.6.52
- Evaluating zeroβshot prediction of monomeric protein design success by AlphaFold, ESMFold, and ProteinMPNN (2026)Mario Garcia et al.6.52
- CryoBench: Diverse and challenging datasets for the heterogeneity
problem in cryo-EM (2024)Minkyu Jeon et al.6.50
- RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design (2024)Rishabh Anand et al.6.39
- Robust Inference-Time Steering of Protein Diffusion Models via Embedding Optimization (2026)Minhuan Li et al.6.29
- Sparks: Multi-Agent Artificial Intelligence Model Discovers Protein
Design Principles (2025)Alireza Ghafarollahi and Markus J. Buehler6.28
- Hierarchical quantum embedding by machine learning for large molecular assemblies (2025)Moritz Bensberg et al.6.23
- Machine Learning Enhanced Calculation of Quantum-Classical Binding Free Energies (2025)Moritz Bensberg et al.6.23
- DualEquiNet: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules (2025)Junjie Xu et al.6.12
- Transferable Generative Models Bridge Femtosecond to Nanosecond Time-Step Molecular Dynamics (2025)Juan Viguera Diez and Mathias Schreiner and Simon Olsson6.04
- Flexibility-Conditioned Protein Structure Design with Flow Matching (2025)Vsevolod Viliuga et al.5.93
- Boltz-ABFE: Free Energy Perturbation without Crystal Structures (2025)Stephan Thaler et al.5.93
- Universally Converging Representations of Matter Across Scientific Foundation Models (2025)Sathya Edamadaka et al.5.92
- Prospects for NMR Spectral Prediction on Fault-Tolerant Quantum Computers (2024)Justin E. Elenewski and Christina M. Camara and Amir Kalev5.91
- Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design (2025)Xingyu Su et al.5.87
- All-atom inverse protein folding through discrete flow matching (2025)Kai Yi and Kiarash Jamali and Sjors H. W. Scheres5.87
- Molecular Fingerprints Are Strong Models for Peptide Function Prediction (2025)Jakub Adamczyk et al.5.84
- PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion (2024)Sophia Tang et al.5.79
- Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing (2025)Zijie Qiu et al.5.76
- AReUReDi: Annealed Rectified Updates for Refining Discrete Flows with Multi-Objective Guidance (2025)Tong Chen et al.5.68
- Learning conformational ensembles of proteins based on backbone geometry (2025)Nicolas Wolf et al.5.65
- ProtTeX: Structure-In-Context Reasoning and Editing of Proteins with
Large Language Models (2025)Zicheng Ma and Chuanliu Fan and Zhicong Wang and Zhenyu Chen and Xiaohan Lin and Yanheng Li and Shihao Feng and Jun Zhang and Ziqiang Cao and Yi Qin Gao5.65
- Protein Large Language Models: A Comprehensive Survey (2025)Yijia Xiao et al.5.59
- Protein Language Models and Structure-Based Machine Learning for Prediction of Allosteric Binding Sites in Protein Kinases: An Explainable AI Framework Grounded in Energy Landscape-Encoded Frustration (2026)Kamila RiedlovΓ‘ et al.5.58
- BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning (2026)Adibvafa Fallahpour et al.5.58
- Protenix-v1: Toward High-Accuracy Open-Source Biomolecular Structure Prediction (2026)Yuxuan Zhang et al.5.58
- Water-Guided Docking Improves Prediction of Protein-Glycan Complexes. (2026)J. O. Lannot et al.5.58
- IRIS Integrates Sparse Sequence, Experimental, and AI-Predicted Structures for ProteinβRNA Affinity Prediction and Motif Discovery (2026)Eduardo Cisneros de la Rosa et al.5.58
- ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure
Understanding (2024)Yijia Xiao et al.5.57
- Intern-S1: A Scientific Multimodal Foundation Model (2025)Lei Bai et al.5.57
- ProtChatGPT: Towards Understanding Proteins with Large Language Models (2024)Chao Wang et al.5.51
- Lifetime Sample Tracking (LiST): A Data Platform for Materials Science (2026)Anthony Richardella et al.5.49
- Equilibrium cluster statistics of cooperative and anticooperative binding on finite one-dimensional rings (2026)Thomas Alfonsi et al.5.49
- Transferable Boltzmann Generators (2024)Leon Klein and Frank No\'e5.46
- AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking (2025)Chunan Liu et al.5.46