← all datasets

WebArena

Canonical
19papers using it
2024first seen

WebArena is a benchmark dataset used to evaluate the performance of large language model web agents by measuring their ability to execute structured tool actions based on web interactions.

Papers using WebArena (19)

WebArena β€” datasets β€” ai-agents