Berkeley Function Calling Leaderboard v-3
Emerging8papers using it
2024first seen
The 'Berkeley Function Calling Leaderboard v3' is a benchmark dataset that contains 200 tasks used to evaluate the performance of function-calling language agents in relation to their reasoning length and accuracy.
Papers using Berkeley Function Calling Leaderboard v-3 (8)
- ToolACE: Winning the Points of LLM Function CallingOn the Robustness of Agentic Function CallingBrief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language AgentsAwakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel ProverDon't Just Fine-tune the Agent, Tune the EnvironmentTinyllm: Evaluation And Optimization Of Small Language Models For Agentic Tasks On Edge DevicesxLAM: A Family of Large Action Models to Empower AI Agent SystemsAsynchronous LLM Function Calling