BFCL v-3
Emerging4papers using it
27HF downloads
0HF likes
2025first seen
The 'BFCL-v3' dataset/benchmark is used to evaluate the performance of Large Language Models (LLMs) in executing complex, multi-step tasks by providing a structured set of data that reflects the model's capabilities and weaknesses.
π€ Hugging Faceβ mit
Papers using BFCL v-3 (4)
- Can a Single Model Master Both Multi-turn Conversations and Tool Use?
CALM: A Unified Conversational Agentic Language ModelTOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP
EnvironmentsDaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile
Phone AgentsLoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls