Advances In GRPO For Generation Models: A Survey
2026 Β· Zexiang Liu, Xianglong He, Yangguang Li
Abstract
Large-scale flow matching models have achieved strong performance across generative tasks such as text-to-image, video, 3D, and speech synthesis. However, aligning their outputs with human preferences and task-specific objectives remains challenging. Flow-GRPO extends Group Relative Policy Optimization (GRPO) to generation models, enabling stable reinforcement learning alignment for generative systems. Since its introduction, Flow-GRPO has triggered rapid research growth, spanning methodological refinements and diverse application domains. This survey provides a comprehensive review of Flow-GRPO and its subsequent developments. We organize existing work along two primary dimensions. First, we analyze methodological advances beyond the original framework, including reward signal design, credit assignment, sampling efficiency, diversity preservation, reward hacking mitigation, and reward model construction. Second, we examine extensions of GRPO-based alignment across generative paradigms
Authors
(none)
Tags
Stats
Related papers
- F5R-TTS: Improving Flow-matching Based Text-to-speech With Group Relative Policy Optimization (2025)0.00
- Generative Pre-training For Speech With Flow Matching (2023)0.00
- Auto-regressive Vs Flow-matching: A Comparative Study Of Modeling Paradigms For Text-to-music Generation (2025)0.00
- Speculative Decoding And Beyond: An In-depth Survey Of Techniques (2025)0.00
- Glow-tts: A Generative Flow For Text-to-speech Via Monotonic Alignment Search (2020)0.00
- Omniflow: Any-to-any Generation With Multi-modal Rectified Flows (2024)7.78
- Flowtron: An Autoregressive Flow-based Generative Network For Text-to-speech Synthesis (2020)5.91
- Enhance Generation Quality Of Flow Matching V2A Model Via Multi-step Cot-like Guidance And Combined Preference Optimization (2025)0.00