WebArena
Canonical19papers using it
2024first seen
WebArena is a benchmark dataset used to evaluate the performance of large language model web agents by measuring their ability to execute structured tool actions based on web interactions.
Papers using WebArena (19)
- Mobile-agent-v3.5: Multi-platform Fundamental GUI AgentsDevil's Advocate: Anticipatory Reflection For LLM AgentsThe Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes NecessaryOnline Skill Learning for Web Agents via State-Grounded Dynamic RetrievalBeyond Domains: Reusing Web Skills via Transferable Interaction PatternsWeasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data SelectionAPEX: Autonomous Policy Exploration for Self-Evolving LLM AgentsAdarubric: Task-adaptive Rubrics For LLM Agent EvaluationDoes The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web AgentsWEBSERV: A Full-Stack and RL-Ready Web Environment for Training Web Agents at ScaleAgenther: Hindsight Experience Replay For LLM Agent Trajectory RelabelingEnvironment Maps: Structured Environmental Representations For Long-horizon AgentsWebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web AgentOpAgent: Operator Agent for Web NavigationJust-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient UpdatesSkyRL-Agent: Efficient RL Training for Multi-turn LLM AgentBranch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action MemorySurfer 2: The Next Generation Of Cross-platform Computer Use AgentsCoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration