StableToolBench
Emerging12papers using it
2024first seen
StableToolBench is a cost-augmented benchmark that evaluates the performance of budget-constrained tool-augmented agents in solving multi-step tasks while adhering to strict monetary budgets.
Papers using StableToolBench (12)
- Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection LearningNotation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI SystemsCodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process SupervisionBudget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool UseLearning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool UseFitText: Evolving Agent Tool Ecologies via Memetic RetrievalBeyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM ReasoningStabletoolbench-mirrorapi: Modeling Tool Environments As Mirrors Of 7,000+ Real-world ApisSmall Language Models For Agentic Systems: A Survey Of Architectures, Capabilities, And Deployment Trade OffsMIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool LearningStableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of
7,000+ Real-World APIsSmurfs: Multi-agent System Using Context-efficient DFSDT For Tool Planning