MCBench

Emerging

3papers using it

2024first seen

MCBench is a benchmark designed to evaluate large language models' ability to execute string-matching NLP metrics by strictly following step-by-step instructions, providing objective and code-verifiable assessments of their instruction adherence, numerical computation, and long-range consistency.

🔎 Find this dataset

Papers using MCBench (3)

Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models2025

Smoke and Mirrors: Jailbreaking LLM-based Code Generation via Implicit Malicious Prompts2025

RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code2024