BFCL
Emerging14papers using it
2026first seen
The BFCL dataset/benchmark is used to evaluate the effectiveness of tool-calling capabilities in large language models, focusing on their ability to retrieve and utilize demonstrations for specific tasks.
Papers using BFCL (14)
- Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference LearningScaling Agentic Capabilities via Grounded Interaction SynthesisPACT: Privileged Trace Co-Training for Multi-Turn Tool-Use AgentsLooking Is Not Picking: An Attention-Segment Account of Tool-Selection Failures in LLM AgentsTwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM RoutingNotation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI SystemsHow Many Tools Should an LLM Agent See? A Chance-Corrected AnswerBoosting Tool-Calling Capabilities of Large Language Models via a Novel In-Context Learning ApproachTSCG: Deterministic Tool-Schema Compilation for Agentic LLM DeploymentsCoEvolve: Training LLM Agents via Agent-Data Mutual EvolutionBreaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic ModelsTry, Check and Retry: A Divide-and-Conquer Framework for Boosting Long-context Tool-Calling Performance of LLMsBeyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM AgentsLinguistic and Argument Diversity in Synthetic Data for Function-Calling Agents