WebArena
Emerging5papers using it
2025first seen
WebArena is a benchmark dataset that contains traces from large language model (LLM) agents executing tool-using tasks, and it is used to evaluate the effectiveness of online warning monitors for detecting potential failures during these tasks.
Papers using WebArena (5)
- PrefixGuard: From LLM-Agent Traces to Online Failure-Warning MonitorsLearn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in
Realistic EnvironmentsWebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web
TasksFocusAgent: Simple Yet Effective Ways of Trimming the Large Context of
Web AgentsR-WoM: Retrieval-augmented World Model For Computer-use Agents