← all datasets

WebArena

Emerging
5papers using it
2025first seen

WebArena is a benchmark dataset that contains traces from large language model (LLM) agents executing tool-using tasks, and it is used to evaluate the effectiveness of online warning monitors for detecting potential failures during these tasks.

Papers using WebArena (5)

WebArena β€” datasets β€” llm-papers