AlpacaEval
Emerging2papers using it
2025first seen
AlpacaEval is a benchmark used to evaluate the performance of open-weight systems in completing complex, goal-driven tasks through multi-turn dialogue and tool use.
AlpacaEval is a benchmark used to evaluate the performance of open-weight systems in completing complex, goal-driven tasks through multi-turn dialogue and tool use.