AlpacaEval~2
Emerging19papers using it
2024first seen
Papers using AlpacaEval~2 (19)
- Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference OptimizationLess is More: Improving LLM Alignment via Preference Data SelectionPrincipled Data Selection for Alignment: The Hidden Risks of Difficult ExamplesSmall-Margin Preferences Still Matter-If You Train Them RightWeights-Rotated Preference Optimization for Large Language ModelsAligning Large Language Models with Implicit Preferences from User-Generated ContentTowards Bridging the Reward-Generation Gap in Direct Alignment AlgorithmsRobust Preference Optimization via Dynamic Target MarginsConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference OptimizationFinding the Sweet Spot: Preference Data Construction for Scaling
Preference OptimizationFuseChat-3.0: Preference Optimization Meets Heterogeneous Model FusionAlignment through Meta-Weighted Online Sampling: Bridging the Gap
between Data Generation and Preference OptimizationComPO: Preference Alignment via Comparison OraclesRSPO: Regularized Self-Play Alignment of Large Language ModelsDiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language ModelsCapturing Nuanced Preferences: Preference-Aligned Distillation for Small
Language ModelsFinding the Sweet Spot: Preference Data Construction for Scaling Preference OptimizationFrom Drafts to Answers: Unlocking LLM Potential via Aggregation
Fine-TuningT-REG: Preference Optimization with Token-Level Reward Regularization