Rethinking Value Function Learning For Generalization In Reinforcement Learning
2022 Β· Seungyong Moon, Junyeong Lee, Hyun Oh Song
Abstract
Our work focuses on training RL agents on multiple visually diverse environments to improve observational generalization performance. In prior methods, policy and value networks are separately optimized using a disjoint network architecture to avoid interference and obtain a more accurate value function. We identify that a value network in the multi-environment setting is more challenging to optimize and prone to memorizing the training data than in the conventional single-environment setting. In addition, we find that appropriate regularization on the value network is necessary to improve both training and test performance. To this end, we propose Delayed-Critic Policy Gradient (DCPG), a policy gradient algorithm that implicitly penalizes value estimates by optimizing the value network less frequently with more training data than the policy network. This can be implemented using a single unified network architecture. Furthermore, we introduce a simple self-supervised task that learns
Authors
(none)
Tags
Stats
Related papers
- Dynamic Value Estimation For Single-task Multi-scene Reinforcement Learning (2020)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- Pretrain Value, Not Reward: Decoupled Value Policy Optimization (2025)0.00
- The Value Equivalence Principle For Model-based Reinforcement Learning (2020)0.00
- Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-values (2024)0.00
- Understanding What Affects The Generalization Gap In Visual Reinforcement Learning: Theory And Empirical Evidence (2024)5.84
- Look Where You Look! Saliency-guided Q-networks For Generalization In Visual Reinforcement Learning (2022)0.00
- Diversity Through Exclusion (DTE): Niche Identification For Reinforcement Learning Through Value-decomposition (2023)0.00