← all datasets

BenchTrace

Emerging
2papers using it
2026first seen

BenchTrace is a benchmark containing a snapshot-reflection dataset of 1,821 annotated episodes across six tasks, used to evaluate the self-evolution ability of LLM agents through reflection quality and failure avoidance behavior.

Papers using BenchTrace (2)

BenchTrace β€” datasets β€” ai-agents