ToolBench
Canonical12papers using it
2024first seen
ToolBench is a benchmark containing approximately 47,000 tools used to evaluate the performance of large language models in tool retrieval tasks through various query types and probing methods.
Papers using ToolBench (12)
- Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection LearningCoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool RetrievalToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMsAdarubric: Task-adaptive Rubrics For LLM Agent EvaluationHow Many Tools Should an LLM Agent See? A Chance-Corrected AnswerLAM SIMULATOR: Advancing Data Generation For Large Action Model Training Via Online Exploration And Trajectory FeedbackCase-Based Calibration of Adaptive Reasoning and Execution for LLM Tool UseAgenther: Hindsight Experience Replay For LLM Agent Trajectory RelabelingBeyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM AgentsThink-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded ReasoningNaviAgent: Graph-Driven Bilevel Planning for Scalable Tool OrchestrationToolplanner: A Tool Augmented LLM For Multi Granularity Instructions With Path Planning And Feedback