Who&When
Emerging6papers using it
2025first seen
The 'Who&When' dataset/benchmark contains data used to evaluate the step-level accuracy of causal attribution in LLM agents, specifically identifying which step in an agent's decision-making process caused a failure.
Papers using Who&When (6)
- FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided SearchStepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent SystemsCausal Agent Replay: Counterfactual Attribution for LLM-Agent FailuresWhich Agent Causes Task Failures And When? On Automated Failure Attribution Of LLM Multi-agent SystemsVerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent SystemsAutomatic Failure Attribution And Critical Step Prediction Method For Multi-agent Systems Based On Causal Inference