BFCL v-3
Emerging13papers using it
2025first seen
The 'BFCL v-3' dataset/benchmark contains a collection of tasks designed to evaluate the performance of long-horizon reinforcement learning agents, particularly in the context of providing feedback for improving decision-making in complex environments.
Papers using BFCL v-3 (13)
- D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool UsePushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and ActivationMAVEN: Improving Generalization in Agentic Tool CallingTopoCurate:Modeling Interaction Topology for Tool-Use Agent TrainingHINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon AgentsEnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RLToolWeave: Structured Synthesis of Complex Multi-Turn Tool-Calling DialoguesTIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool CompositionControllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement LearningMagicAgent: Towards Generalized Agent PlanningGecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool CallsOn Generalization in Agentic Tool Calling: CoreThink Agentic Reasoner and MAVEN DatasetSmall Language Models For Agentic Systems: A Survey Of Architectures, Capabilities, And Deployment Trade Offs