Awesome Browser Agents
Browser Agents is one of the most active areas in Awesome AI Agents β 697 papers in this collection, evaluated on datasets like OSWorld, ALFWorld, SWE-bench. A strong starting point is "EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction".
Datasets & benchmarks
Key papers
- EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction (2024)Siyu Yuan, Kaitao Song, Jiangjie Chen, et al.16.88
- FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents (2026)Jia Deng et al.14.20
- Deep Research Agents: A Systematic Examination And Roadmap (2025)Yuxuan Huang et al.14.02
- OpenComputer: Verifiable Software Worlds for Computer-Use Agents (2026)Jinbiao Wei et al.12.94
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research (2026)Dingbang Wu et al.12.91
- K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts (2026)Nahyun Lee et al.12.62
- Understanding The Weakness Of Large Language Model Agents Within A Complex Android Environment (2024)Mingzhe Xing, Rongkai Zhang, Hui Xue, et al.12.57
- DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation (2026)Yibo Wang et al.12.15
- GrepSeek: Training Search Agents for Direct Corpus Interaction (2026)Alireza Salemi et al.12.06
- Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents (2026)Suji Kim et al.12.03
- QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks (2026)Jian Xie et al.11.71
- Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism (2026)Haoxiang Zhang et al.11.44
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses (2026)Pengcheng Jiang et al.11.09
- CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents (2026)Bowen Wang et al.10.88
- AI Research Agents Narrow Scientific Exploration (2026)Yixuan Tang et al.10.88
- MARS: Modular Agent with Reflective Search for Automated AI Research (2026)Jiefeng Chen et al.10.85
- Swe-agent: Agent-computer Interfaces Enable Automated Software Engineering (2024)John Yang, Carlos E. Jimenez, Alexander Wettig, et al.10.85
- Benchmark Test-Time Scaling of General LLM Agents (2026)Xiaochuan Li et al.10.31
- LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know? (2026)HuiMing Fan et al.10.00
- Mobile-agent-v3.5: Multi-platform Fundamental GUI Agents (2026)Haiyang Xu, Xi Zhang, Haowei Liu, et al.9.87
- SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents (2026)Wenxuan Wang et al.9.73
- Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills (2026)Chuan Xiao et al.9.73
- CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks? (2026)Yuxin Zhang et al.9.73
- VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions (2026)Yuxin Chen et al.9.67
- Data Interpreter: An LLM Agent For Data Science (2024)Sirui Hong, Yizhang Lin, Bang Liu, et al.9.66
- RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents (2025)Zijing Zhang et al.9.64
- Agent-pro: Learning To Evolve Via Policy-level Reflection And Optimization (2024)Wenqi Zhang, Ke Tang, Hai Wu, et al.9.59
- OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents (2026)Rui Yang et al.9.54
- TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents (2026)Hang Yan et al.9.48
- Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents (2026)Haoyi Hu et al.9.41
- Edge Large AI Model Agent-Empowered Cognitive Multimodal Semantic Communication (2026)Y. Sun et al.9.16
- PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions (2026)Chenxin Li et al.9.11
- LLM Agents Making Agent Tools (2025)Georg WΓΆlflein, Dyke Ferber, Daniel Truhn, et al.9.07
- Tptu-v2: Boosting Task Planning And Tool Usage Of Large Language Model-based Agents In Real-world Systems (2023)Yilun Kong, Jingqing Ruan, Yihong Chen, et al.9.03
- EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge (2026)Yunhan Wang et al.8.95
- Towards Responsible Generative AI: A Reference Architecture For Designing Foundation Model Based Agents (2023)Qinghua Lu, Liming Zhu, Xiwei Xu, et al.8.60
- How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation (2026)Yimeng Chen et al.8.48
- A Survey On The Optimization Of Large Language Model-based Agents (2025)Shangheng Du, Jiabao Zhao, Jinxin Shi, et al.8.37
- Mobile-bench: An Evaluation Benchmark For Llm-based Mobile Agents (2024)Shihan Deng, Weikai Xu, Hongda Sun, et al.8.35
- Agent Lumos: Unified And Modular Training For Open-source Language Agents (2023)da Yin, Faeze Brahman, Abhilasha Ravichander, et al.8.35
- Ui-venus-1.5 Technical Report (2026)Venus Team, Changlong Gao, Zhangxuan Gu, et al.7.99
- WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents (2025)Zile Qiao et al.7.91
- Large Action Models: From Inception To Implementation (2024)Lu Wang, Fangkai Yang, Chaoyun Zhang, et al.7.86
- SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents (2026)Bingchen Zhao et al.7.79
- AgentSearchBench: A Benchmark for AI Agent Search in the Wild (2026)Bin Wu et al.7.71
- AIOS: LLM Agent Operating System (2024)Kai Mei, Xi Zhu, Wujiang Xu, et al.7.50
- See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents (2026)Siyi Chen et al.7.37
- OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents (2026)Chenyu Zhou et al.7.31
- Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion (2026)Stine Lyngs{\o} Beltoft et al.7.31
- AI agent in healthcare: applications, evaluations, and future directions (2026)Lina Zhao et al.7.24
- AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents (2025)Renxi Wang et al.7.23
- HiPlan: Hierarchical Planning for LLM-Based Agents with Adaptive Global-Local Guidance (2025)Ziyue Li et al.7.17
- Agentbench: Evaluating Llms As Agents (2023)Xiao Liu, Hao Yu, Hanchen Zhang, et al.7.06
- Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents (2026)Dae Yon Hwang et al.6.95
- Beyond Ten Turns: Unlocking Long-horizon Agentic Search With Large-scale Asynchronous RL (2025)Jiaxuan Gao, Wei Fu, Minyang Xie, et al.6.84
- A History-Aware Visually Grounded Critic for Computer Use Agents (2026)Jaewoo Lee et al.6.75
- Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions (2024)Huachuan Qiu et al.6.61
- Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems (2025)Weibo Gao et al.6.58
- Self-Challenging Language Model Agents (2025)Yifei Zhou et al.6.57
- Exp-bench: Can AI Conduct AI Research Experiments? (2025)Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, et al.6.54