← all datasets

StableToolBench

Emerging

12papers using it

2024first seen

StableToolBench is a cost-augmented benchmark that evaluates the performance of budget-constrained tool-augmented agents in solving multi-step tasks while adhering to strict monetary budgets.

🔎 Find this dataset

Papers using StableToolBench (12)

Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning2025 · 14 cites

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems2026

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision2025 · 1 cites

Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use2026

Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use2026

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval2026

Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning2025

Stabletoolbench-mirrorapi: Modeling Tool Environments As Mirrors Of 7,000+ Real-world Apis2025

Small Language Models For Agentic Systems: A Survey Of Architectures, Capabilities, And Deployment Trade Offs2025

MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning2025

StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs2025

Smurfs: Multi-agent System Using Context-efficient DFSDT For Tool Planning2024

StableToolBench — datasets — ai-agents