Awesome Agentic
Agentic is one of the most active areas in Awesome LLM Papers β 1,030 papers in this collection, evaluated on datasets like ALFWorld, LoCoMo, WebShop. A strong starting point is "React: Synergizing Reasoning And Acting In Language Models".
Datasets & benchmarks
Key papers
- React: Synergizing Reasoning And Acting In Language Models (2022)Shunyu Yao, Jeffrey Zhao, Dian Yu, et al.36.63
- Agenttuning: Enabling Generalized Agent Abilities For Llms (2023)Aohan Zeng, Mingdao Liu, Rui Lu, et al.32.67
- Executable Code Actions Elicit Better LLM Agents (2024)Xingyao Wang, Yangyi Chen, Lifan Yuan, et al.31.13
- Agentverse: Facilitating Multi-agent Collaboration And Exploring Emergent Behaviors (2023)Weize Chen, Yusheng Su, Jingwei Zuo, et al.29.70
- Agentscope: A Flexible Yet Robust Multi-agent Platform (2024)Dawei Gao, Zitao Li, Xuchen Pan, et al.29.54
- Autogen: Enabling Next-gen LLM Applications Via Multi-agent Conversation (2023)Qingyun Wu, Gagan Bansal, Jieyu Zhang, et al.29.16
- Agentless: Demystifying Llm-based Software Engineering Agents (2024)Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, et al.27.93
- EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction (2024)Siyu Yuan, Kaitao Song, Jiangjie Chen, et al.27.38
- Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024)Zhiheng Xi, Yiwen Ding, Wenxiang Chen, et al.27.20
- Agent-flan: Designing Data And Methods Of Effective Agent Tuning For Large Language Models (2024)Zehui Chen, Kuikun Liu, Qiuchen Wang, et al.25.55
- Agentohana: Design Unified Data And Training Pipeline For Effective Agent Learning (2024)Jianguo Zhang, Tian Lan, Rithesh Murthy, et al.24.90
- Controlllm: Augment Language Models With Tools By Searching On Graphs (2023)Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, et al.24.66
- Reconcile: Round-table Conference Improves Reasoning Via Consensus Among Diverse Llms (2023)Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal24.61
- Large Language Model Based Multi-agents: A Survey Of Progress And Challenges (2024)Taicheng Guo, Xiuying Chen, Yaqi Wang, et al.24.39
- Simulating Classroom Education With Llm-empowered Agents (2024)Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, et al.24.11
- MMAU: A Holistic Benchmark Of Agent Capabilities Across Diverse Domains (2024)Guoli Yin, Haoping Bai, Shuang Ma, et al.23.95
- You Only Look At Screens: Multimodal Chain-of-action Agents (2023)Zhuosheng Zhang, Aston Zhang23.42
- Metatool Benchmark For Large Language Models: Deciding Whether To Use Tools And Which To Use (2023)Yue Huang, Jiawen Shi, Yuan Li, et al.22.68
- T-eval: Evaluating The Tool Utilization Capability Of Large Language Models Step By Step (2023)Zehui Chen, Weihua Du, Wenwei Zhang, et al.21.69
- Self-reflection In LLM Agents: Effects On Problem-solving Performance (2024)Matthew Renze, Erhan Guven21.59
- Smartplay: A Benchmark For Llms As Intelligent Agents (2023)Yue Wu, Xuan Tang, Tom M. Mitchell, et al.21.32
- Character-llm: A Trainable Agent For Role-playing (2023)Yunfan Shao, Linyang Li, Junqi Dai, et al.21.19
- Avalon's Game Of Thoughts: Battle Against Deception Through Recursive Contemplation (2023)Shenzhi Wang, Chang Liu, Zilong Zheng, et al.20.05
- Recursive Introspection: Teaching Language Model Agents How To Self-improve (2024)Yuxiao Qu, Tianjun Zhang, Naman Garg, et al.19.75
- Parrot: Efficient Serving Of Llm-based Applications With Semantic Variable (2024)Chaofan Lin, Zhenhua Han, Chengruidong Zhang, et al.19.72
- LASER: LLM Agent With State-space Exploration For Web Navigation (2023)Kaixin Ma, Hongming Zhang, Hongwei Wang, et al.19.71
- From Persona To Personalization: A Survey On Role-playing Language Agents (2024)Jiangjie Chen, Xintao Wang, Rui Xu, et al.19.12
- Agentsquare: Automatic LLM Agent Search In Modular Design Space (2024)Yu Shang, Yu Li, Keyu Zhao, et al.18.98
- Evil Geniuses: Delving Into The Safety Of Llm-based Agents (2023)Yu Tian, Xiao Yang, Jingyuan Zhang, et al.18.92
- Repairagent: An Autonomous, Llm-based Agent For Program Repair (2024)Islem Bouzenia, Premkumar Devanbu, Michael Pradel18.60
- Goex: Perspectives And Designs Towards A Runtime For Autonomous LLM Applications (2024)Shishir G. Patil, Tianjun Zhang, Vivian Fang, et al.18.57
- Exploring Large Language Model Based Intelligent Agents: Definitions, Methods, And Prospects (2024)Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang, et al.18.49
- Llm-powered Hierarchical Language Agent For Real-time Human-ai Coordination (2023)Jijia Liu, Chao Yu, Jiaxuan Gao, et al.18.44
- Badagent: Inserting And Activating Backdoor Attacks In LLM Agents (2024)Yifei Wang, Dizhan Xue, Shengjie Zhang, et al.18.26
- Taskbench: Benchmarking Large Language Models For Task Automation (2023)Yongliang Shen, Kaitao Song, Xu Tan, et al.18.24
- Ghostwriter: Augmenting Collaborative Human-ai Writing Experiences Through Personalization And Agency (2024)Catherine Yeh, Gonzalo Ramos, Rachel Ng, et al.18.07
- Llm-coordination: Evaluating And Analyzing Multi-agent Coordination Abilities In Large Language Models (2023)Saaket Agashe, Yue Fan, Anthony Reyna, et al.17.85
- Autoflow: Automated Workflow Generation For Large Language Model Agents (2024)Zelong Li, Shuyuan Xu, Kai Mei, et al.17.75
- Kg-agent: An Efficient Autonomous Agent Framework For Complex Reasoning Over Knowledge Graph (2024)Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, et al.17.58
- Agentic Reward Modeling: Integrating Human Preferences With Verifiable Correctness Signals For Reliable Reward Systems (2025)Hao Peng, Yunjia Qi, Xiaozhi Wang, et al.17.49
- Agentboard: An Analytical Evaluation Board Of Multi-turn LLM Agents (2024)Chang Ma, Junlei Zhang, Zhihao Zhu, et al.17.06
- Benchmark Self-evolving: A Multi-agent Framework For Dynamic LLM Evaluation (2024)Siyuan Wang, Zhuohan Long, Zhihao Fan, et al.16.99
- Mmedagent: Learning To Use Medical Tools With Multi-modal Agent (2024)Binxu Li, Tiankai Yan, Yuanting Pan, et al.16.95
- Recai: Leveraging Large Language Models For Next-generation Recommender Systems (2024)Jianxun Lian, Yuxuan Lei, Xu Huang, et al.16.93
- Reason For Future, Act For Now: A Principled Framework For Autonomous LLM Agents With Provable Sample Efficiency (2023)Zhihan Liu, Hao Hu, Shenao Zhang, et al.16.93
- When Is Tree Search Useful For LLM Planning? It Depends On The Discriminator (2024)Ziru Chen, Michael White, Raymond Mooney, et al.16.92
- Systematic Biases In LLM Simulations Of Debates (2024)Amir Taubenfeld, Yaniv Dover, Roi Reichart, et al.16.92
- Luminate: Structured Generation And Exploration Of Design Space With Large Language Models For Human-ai Co-creation (2023)Sangho Suh, Meng Chen, Bryan Min, et al.16.90
- Understanding The Weakness Of Large Language Model Agents Within A Complex Android Environment (2024)Mingzhe Xing, Rongkai Zhang, Hui Xue, et al.16.89
- Formal-llm: Integrating Formal Language And Natural Language For Controllable Llm-based Agents (2024)Zelong Li, Wenyue Hua, Hao Wang, et al.16.74
- Stabletoolbench: Towards Stable Large-scale Benchmarking On Tool Learning Of Large Language Models (2024)Zhicheng Guo, Sijie Cheng, Hao Wang, et al.16.68
- Archer: Training Language Model Agents Via Hierarchical Multi-turn RL (2024)Yifei Zhou, Andrea Zanette, Jiayi Pan, et al.16.61
- Large Language Model Enhanced Multi-agent Systems For 6G Communications (2023)Feibo Jiang, Li Dong, Yubo Peng, et al.16.59
- Agentsims: An Open-source Sandbox For Large Language Model Evaluation (2023)Jiaju Lin, Haoran Zhao, Aochi Zhang, et al.16.59
- Hiagent: Hierarchical Working Memory Management For Solving Long-horizon Agent Tasks With Large Language Model (2024)Mengkang Hu, Tianxing Chen, Qiguang Chen, et al.16.39
- Language Agents With Reinforcement Learning For Strategic Play In The Werewolf Game (2023)Zelai Xu, Chao Yu, Fei Fang, et al.16.23
- Gaia2: Benchmarking LLM Agents On Dynamic And Asynchronous Environments (2026)Romain Froger, Pierre Andrews, Matteo Bettini, et al.16.17
- K-level Reasoning: Establishing Higher Order Beliefs In Large Language Models For Strategic Reasoning (2024)Yadong Zhang, Shaoguang Mao, Tao Ge, et al.16.09
- Graphreader: Building Graph-based Agent To Enhance Long-context Abilities Of Large Language Models (2024)Shilong Li, Yancheng He, Hangyu Guo, et al.16.07
- Learn-by-interact: A Data-centric Framework For Self-adaptive Agents In Realistic Environments (2025)Hongjin Su, Ruoxi Sun, Jinsung Yoon, et al.16.07