← all datasets

tau-2-bench

Emerging
9papers using it
2025first seen

The 'tau2-bench' is a benchmark that evaluates the performance of models in orchestrating multi-step tool calls within realistic stateful execution environments.

Papers using tau-2-bench (9)

tau-2-bench β€” datasets β€” reinforcement-learning