← all datasets

SkillsBench

Emerging

15papers using it

2026first seen

🔎 Find this dataset

Papers using SkillsBench (15)

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation2026

Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills2026 · 1 cites

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming2026

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision2026

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale2026

AIP: A Graph Representation for Learning and Governing Agent Skills2026

What Should a Skill Remember? Quality--Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents2026

SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-Refinement2026

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior2026

SkillsInjector: Dynamic Skill Context Construction for LLM Agents2026

SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering2026

SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces2026

Coevoskills: Self-evolving Agent Skills Via Co-evolutionary Verification2026

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents2026

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation2026

SkillsBench — datasets — ai-agents