tau-2-bench
Emerging9papers using it
2025first seen
The 'tau2-bench' is a benchmark that evaluates the performance of models in orchestrating multi-step tool calls within realistic stateful execution environments.
Papers using tau-2-bench (9)
- Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live EnvironmentsKAT-Coder-V2 Technical ReportAutoForge: Automated Environment Synthesis for Agentic Reinforcement LearningToolOrchestra: Elevating Intelligence via Efficient Model and Tool OrchestrationRobust Tool Use via Fission-GRPO: Learning to Recover from Execution ErrorsFrom Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using AgentsStep 3.5 Flash: Open Frontier-Level Intelligence with 11B Active ParametersTopoCurate:Modeling Interaction Topology for Tool-Use Agent TrainingKimi K2: Open Agentic Intelligence