Berkeley Function Calling Leaderboard v-3

Emerging

9papers using it

2024first seen

The 'Berkeley Function Calling Leaderboard v-3' is a benchmark dataset that contains 200 tasks used to evaluate the performance of function-calling language agents, particularly in relation to the effects of chain-of-thought reasoning on their accuracy.

🔎 Find this dataset

Papers using Berkeley Function Calling Leaderboard v-3 (9)

ToolACE: Winning the Points of LLM Function Calling2024 · 3 cites

Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling2025 · 1 cites

On the Robustness of Agentic Function Calling2025 · 1 cites

Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents2026

Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover2026

Don't Just Fine-tune the Agent, Tune the Environment2025

Tinyllm: Evaluation And Optimization Of Small Language Models For Agentic Tasks On Edge Devices2025

xLAM: A Family of Large Action Models to Empower AI Agent Systems2024 · 5 cites

Asynchronous LLM Function Calling2024 · 1 cites