M-3ToolEval

Emerging

2papers using it

2025first seen

The 'M-3ToolEval' is a benchmark dataset used to evaluate the reliability of code-mode tool use in models by assessing their performance on tasks involving inter-tool contract compliance and execution feedback.

🔎 Find this dataset

Papers using M-3ToolEval (2)

Self-Challenging Language Model Agents2025 · 1 cites

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement2026