BFCLv-3
Emerging12papers using it
2025first seen
The 'BFCLv3' dataset/benchmark is used to evaluate the effectiveness of tool-use agents by providing a structured representation of interaction dynamics through multi-trial rollouts of tasks.
Papers using BFCLv-3 (12)
- D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool UsePushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and ActivationMAVEN: Improving Generalization in Agentic Tool CallingTopoCurate:Modeling Interaction Topology for Tool-Use Agent TrainingHINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon AgentsEnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RLToolWeave: Structured Synthesis of Complex Multi-Turn Tool-Calling DialoguesControllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement LearningMagicAgent: Towards Generalized Agent PlanningGecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool CallsOn Generalization in Agentic Tool Calling: CoreThink Agentic Reasoner and MAVEN DatasetSmall Language Models For Agentic Systems: A Survey Of Architectures, Capabilities, And Deployment Trade Offs