← all datasets

StableToolBench

Emerging
5papers using it
2024first seen

StableToolBench is a cost-augmented benchmark that evaluates the performance of budget-constrained tool-augmented agents in solving multi-step tasks while adhering to strict monetary budgets.

Papers using StableToolBench (5)

StableToolBench β€” datasets β€” ai-for-code