Reward Models In Deep Reinforcement Learning: A Survey
2025 Β· Rui Yu, Shenghua Wan, Yucen Wang, et al.
Abstract
In reinforcement learning (RL), agents continually interact with the environment and use the feedback to refine their behavior. To guide policy optimization, reward models are introduced as proxies of the desired objectives, such that when the agent maximizes the accumulated reward, it also fulfills the task designer's intentions. Recently, significant attention from both academic and industrial researchers has focused on developing reward models that not only align closely with the true objectives but also facilitate policy optimization. In this survey, we provide a comprehensive review of reward modeling techniques within the deep RL literature. We begin by outlining the background and preliminaries in reward modeling. Next, we present an overview of recent reward modeling approaches, categorizing them based on the source, the mechanism, and the learning paradigm. Building on this understanding, we discuss various applications of these reward modeling techniques and review methods fo
Authors
(none)
Tags
Stats
Related papers
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Evolutionary Reinforcement Learning: A Survey (2023)13.93
- A Comprehensive Survey Of Reinforcement Learning: From Algorithms To Practical Challenges (2024)0.00
- A Survey On Intrinsic Motivation In Reinforcement Learning (2019)0.00
- A Survey On Explainable Reinforcement Learning: Concepts, Algorithms, Challenges (2022)0.00
- Intrinsic Motivation In Model-based Reinforcement Learning: A Brief Review (2023)0.00
- Scalable Agent Alignment Via Reward Modeling: A Research Direction (2018)0.00
- Deep Reinforcement Learning From Hierarchical Preference Design (2023)2.00