Awesome Papers

Papers

Autogen Studio: A No-code Developer Tool For Building And Debugging Multi-agent Systems (2024)
Victor Dibia, Jingya Chen, Gagan Bansal, et al.
19.59
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning (2025)
Huatong Song et al.
18.73
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL (2025)
Weizhen Li et al.
18.23
EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction (2024)
Siyu Yuan, Kaitao Song, Jiangjie Chen, et al.
16.88
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners (2025)
Yuhang Liu et al.
16.27
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments (2026)
Jundong Xu et al.
16.01
Mapcoder: Multi-agent Code Generation For Competitive Problem Solving (2024)
Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez
15.96
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning (2026)
Seokju Cho et al.
15.70
MiniMax Sparse Attention (2026)
Xunhao Lai et al.
15.51
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security (2026)
Dongrui Liu et al.
14.78
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents (2026)
Jia Deng et al.
14.20
Deep Research Agents: A Systematic Examination And Roadmap (2025)
Yuxuan Huang et al.
14.02
Kimi K2.5: Visual Agentic Intelligence (2026)
Kimi Team: Tongtong Bai et al.
13.84
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks (2026)
Mengyu Zheng et al.
13.33
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time? (2026)
Woojung Song et al.
13.24
ACC: Compiling Agent Trajectories for Long-Context Training (2026)
Qisheng Su et al.
13.06
OpenComputer: Verifiable Software Worlds for Computer-Use Agents (2026)
Jinbiao Wei et al.
12.94
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints (2026)
Jiayu Liu et al.
12.83
SWE-Explore: Benchmarking How Coding Agents Explore Repositories (2026)
Shaoqiu Zhang et al.
12.73
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation (2026)
Tianyi Zhou et al.
12.70
SkillOpt: Executive Strategy for Self-Evolving Agent Skills (2026)
Yifan Yang et al.
12.58
Understanding The Weakness Of Large Language Model Agents Within A Complex Android Environment (2024)
Mingzhe Xing, Rongkai Zhang, Hui Xue, et al.
12.57
GrepSeek: Training Search Agents for Direct Corpus Interaction (2026)
Alireza Salemi et al.
12.06
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents (2026)
Suji Kim et al.
12.03
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents (2025)
Kunlun Zhu et al.
11.84
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks (2026)
Jian Xie et al.
11.71
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery (2026)
Amy Xin et al.
11.55
Rethinking Memory as Continuously Evolving Connectivity (2026)
Jizhan Fang et al.
11.49
Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism (2026)
Haoxiang Zhang et al.
11.44
$\pi$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows (2026)
Haoran Zhang et al.
11.23
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning (2026)
Peng Xia et al.
11.12
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses (2026)
Pengcheng Jiang et al.
11.09
SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories (2026)
Zhuoyun Yu et al.
10.93
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents (2026)
Bowen Wang et al.
10.88
AI Research Agents Narrow Scientific Exploration (2026)
Yixuan Tang et al.
10.88
MARS: Modular Agent with Reflective Search for Automated AI Research (2026)
Jiefeng Chen et al.
10.85
Swe-agent: Agent-computer Interfaces Enable Automated Software Engineering (2024)
John Yang, Carlos E. Jimenez, Alexander Wettig, et al.
10.85
Qwen3-Coder-Next Technical Report (2026)
Ruisheng Cao et al.
10.84
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection (2026)
Zheng Wu et al.
10.77
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence (2026)
MiniMax et al.
10.69
Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms (2026)
Saidakhror Gulyamov et al.
10.60
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics (2026)
Mingxian Lin et al.
10.48
MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems (2025)
Rui Ye et al.
10.47
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking (2026)
Qiang Zhang et al.
10.34
Benchmark Test-Time Scaling of General LLM Agents (2026)
Xiaochuan Li et al.
10.31
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers (2025)
Zhenting Wang et al.
10.30
Agent^2 Rl-bench: Can LLM Agents Engineer Agentic RL Post-training? (2026)
Wanyi Chen, Xiao Yang, Xu Yang, et al.
10.22
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents (2026)
Minhua Lin et al.
10.15
Multiagentbench: Evaluating The Collaboration And Competition Of LLM Agents (2025)
Kunlun Zhu, Hongyi Du, Zhaochen Hong, et al.
10.08
OpenSkill: Open-World Self-Evolution for LLM Agents (2026)
Zhiling Yan et al.
10.05