Redeeming Intrinsic Rewards Via Constrained Optimization
2022 Β· Eric Chen, Zhang-Wei Hong, Joni Pajarinen, et al.
Abstract
State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., \(\epsilon\)-greedy) for exploration, but this method fails on hard exploration tasks like Montezuma's Revenge. To address the challenge of exploration, prior works incentivize exploration by rewarding the agent when it visits novel states. Such intrinsic rewards (also called exploration bonus or curiosity) often lead to excellent performance on hard exploration tasks. However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available. Consequently, such an overly curious agent performs worse than an agent trained with only task reward. Such inconsistency in performance across tasks prevents the widespread use of intrinsic rewards with RL algorithms. We propose a principled constrained optimization procedure called Extrinsic-Intrinsic Policy Optimization (EIPO) that automat
Authors
(none)
Tags
Stats
Related papers
- Intrinsic Reward Policy Optimization For Sparse-reward Environments (2026)0.00
- The Impact Of Intrinsic Rewards On Exploration In Reinforcement Learning (2025)0.00
- Rlexplore: Accelerating Research In Intrinsically-motivated Reinforcement Learning (2024)5.33
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00
- Intrinsic Rewards For Exploration Without Harm From Observational Noise: A Simulation Study Based On The Free Energy Principle (2024)0.00
- On Learning Intrinsic Rewards For Policy Gradient Methods (2018)0.00
- Continuously Discovering Novel Strategies Via Reward-switching Policy Optimization (2022)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00