← all datasets

MCBench

Emerging
3papers using it
2024first seen

MCBench is a benchmark designed to evaluate large language models' ability to execute string-matching NLP metrics by strictly following step-by-step instructions, providing objective and code-verifiable assessments of their instruction adherence, numerical computation, and long-range consistency.

Papers using MCBench (3)

MCBench β€” datasets β€” ai-for-code