Arena-Hard
Emerging12papers using it
66HF downloads
1HF likes
2024first seen
Papers using Arena-Hard (12)
- References Improve LLM Alignment in Non-Verifiable DomainsOnline Rubrics Elicitation from Pairwise ComparisonsTGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference OptimizationPretrain Value, Not Reward: Decoupled Value Policy OptimizationScalable Reinforcement Post-Training Beyond Static Human Prompts:
Evolving Alignment via Asymmetric Self-PlayThe Perfect Blend: Redefining RLHF with Mixture of JudgesCritique-out-Loud Reward ModelsAlphaDPO: Adaptive Reward Margin for Direct Preference OptimizationDr. SoW: Density Ratio of Strong-over-weak LLMs for Reducing the Cost of
Human Annotation in Preference TuningT-REG: Preference Optimization with Token-Level Reward RegularizationSegmenting Text and Learning Their Rewards for Improved RLHF in Language
ModelReward Model Routing in Alignment