StableToolBench

Emerging

5papers using it

2024first seen

StableToolBench is a cost-augmented benchmark that evaluates the performance of budget-constrained tool-augmented agents in solving multi-step tasks while adhering to strict monetary budgets.

🔎 Find this dataset

Papers using StableToolBench (5)

ParaTool: Shifting Tool Representations from Context to Parameters2026

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision2025 · 1 cites

Self-Improving World Modelling with Latent Actions2026

Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use2026

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models2024