tau-2-bench

Emerging

10papers using it

2025first seen

The 'tau-2-bench' is a benchmark that evaluates the performance of models in orchestrating multi-step tool calls within realistic stateful execution environments.

🔎 Find this dataset

Papers using tau-2-bench (10)

Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments2026

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL2026

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training2026

KAT-Coder-V2 Technical Report2026

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters2026

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors2026

From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents2026

AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration2025

Kimi K2: Open Agentic Intelligence2025