StableToolBench
Emerging5papers using it
2024first seen
StableToolBench is a cost-augmented benchmark that evaluates the performance of budget-constrained tool-augmented agents in solving multi-step tasks while adhering to strict monetary budgets.
Papers using StableToolBench (5)
- ParaTool: Shifting Tool Representations from Context to ParametersCodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process SupervisionSelf-Improving World Modelling with Latent ActionsBudget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool UseStableToolBench: Towards Stable Large-Scale Benchmarking on Tool
Learning of Large Language Models