#ModelAccuracyPaper
1Beyond KL Divergence: Policy Optimization With Flexible Bregman Divergences For LLM Reasoning86.70β€”
2AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback67.30β€”
3Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks12.16β€”
GSM8K gsm8k Leaderboard