← all datasets

Arena-Hard

Emerging
3papers using it
2025first seen

'Arena Hard' is a benchmark dataset used to evaluate the performance of models on challenging tasks that require a combination of translation specialization and general-purpose capabilities.

Papers using Arena-Hard (3)

Arena-Hard β€” datasets β€” ai-for-code