Who&When

Emerging

7papers using it

2025first seen

The 'Who&When' dataset/benchmark contains data used to evaluate the step-level accuracy of causal attribution in LLM agents, specifically focusing on identifying which step in an agent's decision-making process caused a failure.

🔎 Find this dataset

Papers using Who&When (6)

FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search2026

StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems2026

Causal Agent Replay: Counterfactual Attribution for LLM-Agent Failures2026

SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation2026

Which Agent Causes Task Failures And When? On Automated Failure Attribution Of LLM Multi-agent Systems2025

VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems2026