RewardBench 2 rewardbench-2 Leaderboard
RewardBench 2 β a benchmark for reward models and LLM-as-judge systems used in RLHF. Scores how reliably a model prefers the better of two responses across factuality, precise instruction-following, math, safety, and focus. Score is the overall accuracy. Β· Metric: Score (higher is better)
| # | Model | Score | Paper |
|---|---|---|---|
| 1 | Skywork/Skywork-Reward-V2-Llama-3.1-8B | 84.13 | link |
| 2 | ContextualAI/LMUnit-qwen2.5-72b | 82.08 | link |
| 3 | ContextualAI/LMUnit-llama3.1-70b | 80.54 | link |
| 4 | Databricks-Mosaic-Research/PGRM | 80.02 | link |
| 5 | google/gemini-2.5-pro | 79.48 | link |
| 6 | Skywork/Skywork-Reward-V2-Qwen3-8B | 78.37 | link |
| 7 | google/gemini-2.5-flash | 77.67 | link |
| 8 | nicolinho/QRM-Gemma-2-27B | 76.67 | link |
| 9 | infly/INF-ORM-Llama3.1-70B | 76.48 | link |
| 10 | anthropic/claude-opus-4-20250514 | 76.48 | link |
| 11 | allenai/Llama-3.1-70B-Instruct-RM-RB2 | 76.06 | link |
| 12 | Skywork/Skywork-Reward-Gemma-2-27B | 75.76 | link |
| 13 | Skywork/Skywork-Reward-V2-Qwen3-4B | 75.51 | link |
| 14 | anthropic/claude-3-7-sonnet-20250219 | 75.39 | link |
| 15 | Skywork/Skywork-Reward-Gemma-2-27B-v0.2 | 75.31 | link |
| 16 | Skywork/Skywork-Reward-V2-Llama-3.2-3B | 74.66 | link |
| 17 | LxzGordon/URM-LLaMa-3.1-8B | 73.94 | link |
| 18 | Schrieffer/Llama-SARM-4B | 73.79 | link |
| 19 | Skywork/Skywork-Reward-Llama-3.1-8B | 73.14 | link |
| 20 | allenai/Llama-3.1-8B-Instruct-RM-RB2 | 72.85 | link |
| 21 | ShikaiChen/LDL-Reward-Gemma-2-27B-v0.1 | 72.49 | link |
| 22 | openai/gpt-4.1-2025-04-14 | 72.32 | link |
| 23 | allenai/Llama-3.1-Tulu-3-70B-SFT-RM-RB2 | 72.20 | link |
| 24 | Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 | 71.75 | link |
| 25 | anthropic/claude-sonnet-4-20250514 | 71.17 | link |
| 26 | nicolinho/QRM-Llama3.1-8B-v2 | 70.74 | link |
| 27 | HFXM/RAMO-Llama3.1-8B | 69.17 | link |
| 28 | Skywork/Skywork-VL-Reward-7B | 68.85 | link |
| 29 | allenai/Llama-3.1-Tulu-3-8B-RL-RM-RB2 | 68.71 | link |
| 30 | allenai/Llama-3.1-Tulu-3-8B-DPO-RM-RB2 | 68.70 | link |