Code Agents
50 papers tagged Code Agents โ re-sort below
Papers
- Autogen Studio: A No-code Developer Tool For Building And Debugging Multi-agent Systems (2024)Victor Dibia, Jingya Chen, Gagan Bansal, et al.19.59
- R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning (2025)Huatong Song et al.18.73
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL (2025)Weizhen Li et al.18.23
- EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction (2024)Siyu Yuan, Kaitao Song, Jiangjie Chen, et al.16.88
- InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to
Deliberative Reasoners (2025)Yuhang Liu et al.16.27
- EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments (2026)Jundong Xu et al.16.01
- Mapcoder: Multi-agent Code Generation For Competitive Problem Solving (2024)Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez15.96
- SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning (2026)Seokju Cho et al.15.70
- MiniMax Sparse Attention (2026)Xunhao Lai et al.15.51
- AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security (2026)Dongrui Liu et al.14.78
- FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents (2026)Jia Deng et al.14.20
- Deep Research Agents: A Systematic Examination And Roadmap (2025)Yuxuan Huang et al.14.02
- Kimi K2.5: Visual Agentic Intelligence (2026)Kimi Team: Tongtong Bai et al.13.84
- Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks (2026)Mengyu Zheng et al.13.33
- ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time? (2026)Woojung Song et al.13.24
- ACC: Compiling Agent Trajectories for Long-Context Training (2026)Qisheng Su et al.13.06
- OpenComputer: Verifiable Software Worlds for Computer-Use Agents (2026)Jinbiao Wei et al.12.94
- AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints (2026)Jiayu Liu et al.12.83
- SWE-Explore: Benchmarking How Coding Agents Explore Repositories (2026)Shaoqiu Zhang et al.12.73
- COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation (2026)Tianyi Zhou et al.12.70
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills (2026)Yifan Yang et al.12.58
- Understanding The Weakness Of Large Language Model Agents Within A Complex Android Environment (2024)Mingzhe Xing, Rongkai Zhang, Hui Xue, et al.12.57
- GrepSeek: Training Search Agents for Direct Corpus Interaction (2026)Alireza Salemi et al.12.06
- Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents (2026)Suji Kim et al.12.03
- MultiAgentBench: Evaluating the Collaboration and Competition of LLM
agents (2025)Kunlun Zhu et al.11.84
- QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks (2026)Jian Xie et al.11.71
- EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery (2026)Amy Xin et al.11.55
- Rethinking Memory as Continuously Evolving Connectivity (2026)Jizhan Fang et al.11.49
- Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism (2026)Haoxiang Zhang et al.11.44
- $\pi$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows (2026)Haoran Zhang et al.11.23
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning (2026)Peng Xia et al.11.12
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses (2026)Pengcheng Jiang et al.11.09
- SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories (2026)Zhuoyun Yu et al.10.93
- CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents (2026)Bowen Wang et al.10.88
- AI Research Agents Narrow Scientific Exploration (2026)Yixuan Tang et al.10.88
- MARS: Modular Agent with Reflective Search for Automated AI Research (2026)Jiefeng Chen et al.10.85
- Swe-agent: Agent-computer Interfaces Enable Automated Software Engineering (2024)John Yang, Carlos E. Jimenez, Alexander Wettig, et al.10.85
- Qwen3-Coder-Next Technical Report (2026)Ruisheng Cao et al.10.84
- GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection (2026)Zheng Wu et al.10.77
- The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence (2026)MiniMax et al.10.69
- Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms (2026)Saidakhror Gulyamov et al.10.60
- OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics (2026)Mingxian Lin et al.10.48
- MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems (2025)Rui Ye et al.10.47
- ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking (2026)Qiang Zhang et al.10.34
- Benchmark Test-Time Scaling of General LLM Agents (2026)Xiaochuan Li et al.10.31
- MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers (2025)Zhenting Wang et al.10.30
- Agent^2 Rl-bench: Can LLM Agents Engineer Agentic RL Post-training? (2026)Wanyi Chen, Xiao Yang, Xu Yang, et al.10.22
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents (2026)Minhua Lin et al.10.15
- Multiagentbench: Evaluating The Collaboration And Competition Of LLM Agents (2025)Kunlun Zhu, Hongyi Du, Zhaochen Hong, et al.10.08
- OpenSkill: Open-World Self-Evolution for LLM Agents (2026)Zhiling Yan et al.10.05