ChemCoTBench-V-2

Emerging

1papers using it

2026first seen

ChemCoTBench-V-2 is a rule-verifiable diagnostic benchmark containing 5,620 evaluation samples across 18 tasks, used to assess structured chemical reasoning in large language models by requiring them to provide and verify intermediate reasoning steps.

🔎 Find this dataset

Papers using ChemCoTBench-V-2 (1)

From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models2026