BFCL v-3

Emerging

13papers using it

2025first seen

The 'BFCL v-3' dataset/benchmark contains a collection of tasks designed to evaluate the performance of long-horizon reinforcement learning agents, particularly in the context of providing feedback for improving decision-making in complex environments.

🔎 Find this dataset

Papers using BFCL v-3 (13)

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use2026

Pushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and Activation2026

MAVEN: Improving Generalization in Agentic Tool Calling2026

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training2026

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents2026

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL2026

ToolWeave: Structured Synthesis of Complex Multi-Turn Tool-Calling Dialogues2026

TIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool Composition2026

Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning2026

MagicAgent: Towards Generalized Agent Planning2026

Gecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool Calls2026

On Generalization in Agentic Tool Calling: CoreThink Agentic Reasoner and MAVEN Dataset2025

Small Language Models For Agentic Systems: A Survey Of Architectures, Capabilities, And Deployment Trade Offs2025