Arena-Hard
Emerging3papers using it
2025first seen
'Arena Hard' is a benchmark dataset used to evaluate the performance of models on challenging tasks that require a combination of translation specialization and general-purpose capabilities.
'Arena Hard' is a benchmark dataset used to evaluate the performance of models on challenging tasks that require a combination of translation specialization and general-purpose capabilities.