AlpacaEval 2.0
Emerging16papers using it
2025first seen
'AlpacaEval 2.0' is a dataset/benchmark used to evaluate the alignment of Large Language Models (LLMs) with human preferences through the analysis of generated responses.
Papers using AlpacaEval 2.0 (16)
- This Is Your Doge, If It Please You: Exploring Deception And Robustness In Mixture Of LlmsLabel-Free Reinforcement Learning via Cross-Model EntropyTACOS: Open Tagging and Comparative Scoring for Instruction Fine-Tuning Data SelectionMMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-AgentS-SPPO: Semantic-Calibrated Self-Play Preference OptimizationAligning Large Language Models via Fully Self-Synthetic DataIcon$^{2}$: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent RegulationSGPO: Self-Generated Preference Optimization based on Self-ImproverUnlocking Recursive Thinking of LLMs: Alignment via RefinementPre-DPO: Improving Data Utilization in Direct Preference Optimization
Using a Guiding Reference ModelMaPPO: Maximum a Posteriori Preference Optimization with Prior KnowledgeTemporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via
Past-FutureThis Is Your Doge, If It Please You: Exploring Deception and Robustness
in Mixture of LLMsRethinking Mixture-of-Agents: Is Mixing Different Large Language Models
Beneficial?Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data SynthesisFocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings