Reward Prediction Error As An Exploration Objective In Deep RL
2019 Β· Riley Simmons-Edler, Ben Eisner, Daniel Yang, et al.
Abstract
A major challenge in reinforcement learning is exploration, when local dithering methods such as epsilon-greedy sampling are insufficient to solve a given task. Many recent methods have proposed to intrinsically motivate an agent to seek novel states, driving the agent to discover improved reward. However, while state-novelty exploration methods are suitable for tasks where novel observations correlate well with improved reward, they may not explore more efficiently than epsilon-greedy approaches in environments where the two are not well-correlated. In this paper, we distinguish between exploration tasks in which seeking novel states aids in finding new reward, and those where it does not, such as goal-conditioned tasks and escaping local reward maxima. We propose a new exploration objective, maximizing the reward prediction error (RPE) of a value function trained to predict extrinsic reward. We then propose a deep reinforcement learning method, QXplore, which exploits the temporal di
Authors
(none)
Tags
Stats
Related papers
- Reward Prediction Error Prioritisation In Experience Replay: The RPE-PER Method (2025)0.00
- Redeeming Intrinsic Rewards Via Constrained Optimization (2022)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Rewarding Episodic Visitation Discrepancy For Exploration In Reinforcement Learning (2022)0.00
- Self-supervised Exploration Via Temporal Inconsistency In Reinforcement Learning (2022)3.58
- Long-term Visitation Value For Deep Exploration In Sparse Reward Reinforcement Learning (2020)7.24
- DEIR: Efficient And Robust Exploration Through Discriminative-model-based Episodic Intrinsic Rewards (2023)0.00
- Curious Exploration And Return-based Memory Restoration For Deep Reinforcement Learning (2021)0.00