TAU-bench
Canonical12papers using it
2025first seen
Papers using TAU-bench (12)
- Self-Challenging Language Model AgentsSAMULE: Self-Learning Agents Enhanced by Multi-level ReflectionSkill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference LearningHERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-DistillationRealUserSim: Bridging the Reality Gap in Agent Benchmarking via Grounded User SimulationMAVEN: Improving Generalization in Agentic Tool CallingRobust Tool Use via Fission-GRPO: Learning to Recover from Execution ErrorsCM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool UsePrompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of ExperienceMulti-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward CalibrationOn Generalization in Agentic Tool Calling: CoreThink Agentic Reasoner and MAVEN DatasetEstablishing Best Practices For Building Rigorous Agentic Benchmarks