Awesome AI Agents

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🎯Research Radar 🔖Saved

← all topics overview

Multi-Agent

loading…

Stay Updated

E-Mail Digest 🎯 Research Radar

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Multi-Agent — curated papers, datasets & benchmarks · Awesome AI Agents

← all topics overview

Awesome Multi-Agent

Multi-Agent is one of the most active areas in Awesome AI Agents — 3,735 papers in this collection, evaluated on datasets like ALFWorld, GAIA, WebShop. A strong starting point is "R-Zero: Self-Evolving Reasoning LLM from Zero Data".

Datasets & benchmarks

ALFWorld40 papers

WebShop23 papers

SWE-bench17 papers

HotpotQA17 papers

OSWorld14 papers

LoCoMo13 papers

SWE-bench Verified13 papers

StarCraft Multi-Agent Challenge (SMAC)12 papers

StarCraft II12 papers

tau-2-bench12 papers

Key papers

60 papers · trending (default)numbers = 🔥 heat

R-Zero: Self-Evolving Reasoning LLM from Zero Data (2025)
Chengsong Huang et al.
19.72
Autogen Studio: A No-code Developer Tool For Building And Debugging Multi-agent Systems (2024)
Victor Dibia, Jingya Chen, Gagan Bansal, et al.
19.59
Agentic Reinforced Policy Optimization (2025)
Guanting Dong et al.
19.50
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL (2025)
Weizhen Li et al.
18.12
ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU (2026)
Fan Jiang et al.
17.67
Kimi K2.5: Visual Agentic Intelligence (2026)
Kimi Team: Tongtong Bai et al.
17.30
AskChem: Claim-Centered Infrastructure for Chemistry Literature Synthesis (2026)
Bing Yan et al.
16.55
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners (2025)
Yuhang Liu et al.
16.16
Mapcoder: Multi-agent Code Generation For Competitive Problem Solving (2024)
Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez
15.96
AREX: Towards a Recursively Self-Improving Agent for Deep Research (2026)
Shuqi Lu et al.
15.64
Frontis-MA1: Training an AI4AI Model towards Recursive Self-Improvement in Machine Learning Engineering (2026)
Junlin Yang et al.
15.36
Dynamic Multi-robot Task Allocation Under Uncertainty And Temporal Constraints (2020)
Shushman Choudhury, Jayesh K. Gupta, Mykel J. Kochenderfer, et al.
15.28
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent (2025)
Hongli Yu et al.
15.26
MAPPER: Multi-agent Path Planning With Evolutionary Reinforcement Learning In Mixed Dynamic Environments (2020)
Zuxin Liu, Baiming Chen, Hongyi Zhou, et al.
15.19
A Multi-agent Reinforcement Learning Approach For Efficient Client Selection In Federated Learning (2022)
Sai Qian Zhang, Jieyu Lin, Qi Zhang
15.00
SEED: Self-Evolving On-Policy Distillation for Agentic Reinforcement Learning (2026)
Jinyang Wu et al.
14.72
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security (2026)
Dongrui Liu et al.
14.67
COVINS: Visual-inertial SLAM For Centralized Collaboration (2021)
Patrik Schmuck, Thomas Ziegler, Marco Karrer, et al.
14.66
VMAS: A Vectorized Multi-agent Simulator For Collective Robot Learning (2022)
Matteo Bettini, Ryan Kortvelesy, Jan Blumenkamp, et al.
14.32
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models (2025)
DeepSeek-AI et al.
14.19
Deep Research Agents: A Systematic Examination And Roadmap (2025)
Yuxuan Huang et al.
13.91
ABot-AgentOS: A General Robotic Agent OS with Lifelong Multi-modal Memory (2026)
Jiayi Tian et al.
13.83
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning (2026)
Minki Kang et al.
13.64
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution (2026)
Xucong Wang et al.
13.56
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration (2026)
Jiaqi Liu et al.
13.41
UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks (2026)
Zhekai Chen et al.
13.35
UI-MOPD: Multi-Platform On-Policy Distillation for Continual GUI Agent Learning (2026)
Niu Lian et al.
13.34
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning (2026)
Shuo Yang et al.
13.31
Learning Agent Communication Under Limited Bandwidth By Message Pruning (2019)
Hangyu Mao, Zhengchao Zhang, Zhen Xiao, et al.
13.23
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks (2026)
Mengyu Zheng et al.
13.22
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application (2026)
Jiachun Li et al.
13.11
Heterogeneous Agent Collaborative Reinforcement Learning (2026)
Zhixia Zhang et al.
13.07
EvolvingWorld: An Open-Schema Framework for Co-Evolving Role-Play Agents and World Model in Interactive Literary World (2026)
Qing Zong et al.
12.97
ACC: Compiling Agent Trajectories for Long-Context Training (2026)
Qisheng Su et al.
12.95
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints (2026)
Jiayu Liu et al.
12.72
Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent (2026)
Lei Bai et al.
12.68
AWorld: Orchestrating the Training Recipe for Agentic AI (2025)
Chengyue Yu et al.
12.66
The Verification Horizon: No Silver Bullet for Coding Agent Rewards (2026)
Binghai Wang et al.
12.31
ResearchMath-14K: Scaling Research-Level Mathematics via Agents (2026)
Guijin Son et al.
12.25
Streaming Communication in Multi-Agent Reasoning (2026)
Zhen Yang et al.
12.21
AgentCompass: A Unified Evaluation Infrastructure for Agent Capabilities (2026)
Kai Chen et al.
12.21
Orchestra-o1: Omnimodal Agent Orchestration (2026)
Fan Zhang et al.
12.09
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents (2026)
Dhaval C. Patel et al.
12.09
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation (2026)
Yibo Wang et al.
12.04
Tencent WorkBuddy Bench: A Multi-Domain Coding-Agent Benchmark with Contamination-Resistant Task Construction (2026)
Tencent WorkBuddy Bench Team et al.
11.87
Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms (2026)
Saidakhror Gulyamov et al.
11.81
VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement (2026)
Seohyun Lee et al.
11.77
Agent models: Internalizing Chain-of-Action Generation into Reasoning models (2025)
Yuxiang Zhang et al.
11.76
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning (2026)
Shiding Zhu et al.
11.76
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows? (2026)
Haolin Chen et al.
11.75
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution (2025)
Tianrui Qin et al.
11.74
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents (2025)
Kunlun Zhu et al.
11.73
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking (2026)
Qiang Zhang et al.
11.73
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research (2026)
Pu Ning et al.
11.67
Playful Agentic Robot Learning (2026)
Junyi Zhang et al.
11.62
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning (2026)
Zelai Xu et al.
11.59
Latent Collaboration in Multi-Agent Systems (2025)
Jiaru Zou et al.
11.55
It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs (2026)
Sangwoo Park et al.
11.53
OpenThoughts-Agent: Data Recipes for Agentic Models (2026)
Negin Raoof et al.
11.53
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning (2025)
Haoran Luo et al.
11.44