AlpacaEval

Emerging

2papers using it

2025first seen

AlpacaEval is a benchmark used to evaluate the performance of open-weight systems in completing complex, goal-driven tasks through multi-turn dialogue and tool use.

🔎 Find this dataset

Papers using AlpacaEval (1)

AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks2025 · 1 cites