AlpacaEval 2

Emerging

12papers using it

2024first seen

'AlpacaEval 2' is a benchmark dataset used to evaluate the performance of large language models in open-domain tasks through various assessment metrics.

🔎 Find this dataset

Papers using AlpacaEval 2 (12)

SERL: Self-Examining Reinforcement Learning on Open-Domain2025

Reward Model Routing in Alignment2025

Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions2025

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization2025

DPO Meets PPO: Reinforced Token Optimization for RLHF2024

SimPO: Simple Preference Optimization with a Reference-Free Reward2024 · 17 cites

RLHF Workflow: From Reward Modeling to Online RLHF2024 · 3 cites

The Perfect Blend: Redefining RLHF with Mixture of Judges2024 · 2 cites

Bootstrapping Language Models with DPO Implicit Rewards2024

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning2024

AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization2024

T-REG: Preference Optimization with Token-Level Reward Regularization2024