tau-2-bench

Emerging

17papers using it

2025first seen

'Tau-2 Bench' is a dataset used to evaluate the performance of tool-use agents by providing a structured set of tasks that assess interaction dynamics and the effectiveness of various training strategies.

🔎 Find this dataset

Papers using tau-2-bench (17)

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning2026

Towards General Agentic Intelligence via Environment Scaling2025

PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents2026

Agent Planning Benchmark: A Diagnostic Framework for Planning Capabilities in LLM Agents2026

Proper Scoring Rules for Agentic Uncertainty Quantification2026

SkillsInjector: Dynamic Skill Context Construction for LLM Agents2026

CurateEvo: Data-Curation Evolving for Agentic Post-Training2026

MemGym: a Long-Horizon Memory Environment for LLM Agents2026

MAVEN: Improving Generalization in Agentic Tool Calling2026

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors2026

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training2026

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL2026

Toward Scalable Verifiable Reward: Proxy State-Based Evaluation for Multi-turn Tool-Calling LLM Agents2026

Step 3.5 Flash: Open Frontier-level Intelligence With 11B Active Parameters2026

AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning2025

Toolorchestra: Elevating Intelligence Via Efficient Model And Tool Orchestration2025

On Generalization in Agentic Tool Calling: CoreThink Agentic Reasoner and MAVEN Dataset2025