BenchTrace
Emerging2papers using it
2026first seen
BenchTrace is a benchmark containing a snapshot-reflection dataset of 1,821 annotated episodes across six tasks, used to evaluate the self-evolution ability of LLM agents through reflection quality and failure avoidance behavior.