SkillBench
Emerging3papers using it
17HF downloads
0HF likes
2026first seen
SkillBench is a challenging benchmark designed to evaluate an LLM's logical orchestration and cross-domain skill synthesis capabilities. Developed using the STEPS framework and synthesized via GPT-4.1, it moves beyond simple tool-calling to test how models solve complex, multi-step problems by integrating diverse vertical skills. Dataset Scale & Statistics: The dataset contains 545 high-quality, expert-validated samples. These are grounded in diverse seeds from Infinity-Instruct and⦠See the full description on the dataset page: https://huggingface.co/datasets/Weiyifan/SkillBench.
π€ Hugging Faceβ apache-2.0