Awesome Papers

Papers

Agentic Reinforced Policy Optimization (2025)
Guanting Dong et al.
19.61
Arcee's Mergekit: A Toolkit For Merging Large Language Models (2024)
Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, et al.
18.99
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning (2025)
Huatong Song et al.
18.73
Tool Learning With Large Language Models: A Survey (2024)
Changle Qu, Sunhao Dai, Xiaochi Wei, et al.
18.48
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning (2025)
Haozhan Li et al.
18.40
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL (2025)
Weizhen Li et al.
18.23
EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction (2024)
Siyu Yuan, Kaitao Song, Jiangjie Chen, et al.
16.88
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning (2026)
Seokju Cho et al.
15.70
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments (2026)
Qiuyue Wang et al.
15.10
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security (2026)
Dongrui Liu et al.
14.78
APPO: Agentic Procedural Policy Optimization (2026)
Xucong Wang et al.
14.60
Tool Learning With Foundation Models (2023)
Yujia Qin, Shengding Hu, Yankai Lin, et al.
14.58
Deep Research Agents: A Systematic Examination And Roadmap (2025)
Yuxuan Huang et al.
14.02
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning (2026)
Minki Kang et al.
13.75
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories (2026)
Baochang Ren et al.
13.59
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration (2026)
Jiaqi Liu et al.
13.52
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks (2026)
Mengyu Zheng et al.
13.33
ACC: Compiling Agent Trajectories for Long-Context Training (2026)
Qisheng Su et al.
13.06
OpenComputer: Verifiable Software Worlds for Computer-Use Agents (2026)
Jinbiao Wei et al.
12.94
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models (2025)
DeepSeek-AI et al.
12.70
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation (2026)
Tianyi Zhou et al.
12.70
From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI (2026)
Yongheng Zhang et al.
12.47
Planning, Creation, Usage: Benchmarking Llms For Comprehensive Tool Utilization In Real-world Complex Scenarios (2024)
Shijue Huang, Wanjun Zhong, Jianqiao Lu, et al.
12.31
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation (2026)
Yibo Wang et al.
12.15
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security (2026)
Dongrui Liu et al.
12.12
GrepSeek: Training Search Agents for Direct Corpus Interaction (2026)
Alireza Salemi et al.
12.06
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents (2026)
Suji Kim et al.
12.03
HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry (2026)
Tingyang Chen et al.
11.97
Agent models: Internalizing Chain-of-Action Generation into Reasoning models (2025)
Yuxiang Zhang et al.
11.87
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution (2025)
Tianrui Qin et al.
11.85
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning (2026)
Zelai Xu et al.
11.70
Relay Hindsight Experience Replay: Self-guided Continual Reinforcement Learning For Sequential Object Manipulation Tasks With Sparse Rewards (2022)
Yongle Luo, Yuxin Wang, Kun Dong, et al.
11.58
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery (2026)
Amy Xin et al.
11.55
Personal AI Agent for Camera Roll VQA (2026)
Thao Nguyen et al.
11.45
Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism (2026)
Haoxiang Zhang et al.
11.44
Cosmos 3: Omnimodal World Models for Physical AI (2026)
NVIDIA et al.
11.10
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields (2026)
Liya Zhu et al.
10.93
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents (2026)
Bowen Wang et al.
10.88
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents (2026)
Jingwen Chen et al.
10.72
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs (2026)
Haiquan Lu et al.
10.69
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence (2026)
MiniMax et al.
10.69
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning (2025)
Guanting Dong et al.
10.67
Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms (2026)
Saidakhror Gulyamov et al.
10.60
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas (2026)
Xiaoyu Tian and Haotian Wang and Shuaiting Chen and Hao Zhou and Kaichi Yu and Yudian Zhang and Jade Ouyang and Junxi Yin and Jiong Chen and Baoyan Guo and Lei Zhang and Junjie Tao and Yuansheng Song and Ming Cui and Chengwei Liu
10.59
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation (2026)
Jusuk Lee et al.
10.56
MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation (2026)
Deguo Xia et al.
10.48
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics (2026)
Mingxian Lin et al.
10.48
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning (2026)
Chi-Pin Huang et al.
10.38
Benchmark Test-Time Scaling of General LLM Agents (2026)
Xiaochuan Li et al.
10.31
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers (2025)
Zhenting Wang et al.
10.30