← all datasets

AlpacaEval

Emerging
2papers using it
2025first seen

AlpacaEval is a benchmark used to evaluate the performance of open-weight systems in completing complex, goal-driven tasks through multi-turn dialogue and tool use.

Papers using AlpacaEval (1)

AlpacaEval β€” datasets β€” speech-audio