Optimistic Curiosity Exploration And Conservative Exploitation With Linear Reward Shaping
2022 Β· Hao Sun, Lei Han, Rui Yang, et al.
Abstract
In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of the linear transformation is equivalent to changing the initialization of the \(Q\)-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for bett
Authors
(none)
Tags
Stats
Related papers
- ORSO: Accelerating Reward Design Via Online Reward Selection And Policy Optimization (2024)0.00
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- Viva: Video-trained Value Functions For Guiding Online RL From Diverse Data (2025)0.00
- Unpacking Reward Shaping: Understanding The Benefits Of Reward Engineering On Sample Complexity (2022)4.52
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- BAMDP Shaping: A Unified Framework For Intrinsic Motivation And Reward Shaping (2024)0.00
- Mitigating Distribution Shift In Model-based Offline RL Via Shifts-aware Reward Learning (2024)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00