← all datasets

Who&When

Emerging
6papers using it
2025first seen

The 'Who&When' dataset/benchmark contains data used to evaluate the step-level accuracy of causal attribution in LLM agents, specifically identifying which step in an agent's decision-making process caused a failure.

Papers using Who&When (6)

Who&When β€” datasets β€” ai-agents