Meta-learning To Explore Via Memory Density Feedback
2025 Β· Kevin McKee, Eric Alt, Andrew Grebenisan, et al.
Abstract
Exploration algorithms for reinforcement learning typically replace or augment the reward function with an additional ``intrinsic'' reward that trains the agent to seek previously unseen states of the environment. Here, we consider an exploration algorithm that exploits meta-learning, or learning to learn, such that the agent learns to maximize its exploration progress within a single episode, even between epochs of training. The agent learns a policy that aims to minimize the probability density of new observations with respect to all of its memories. In addition, it receives as feedback evaluations of the current observation density and retains that feedback in a recurrent network. By remembering trajectories of density, the agent learns to navigate a complex and growing landscape of familiarity in real-time, allowing it to maximize its exploration progress even in completely novel states of the environment for which its policy has not been trained.
Authors
(none)
Tags
Stats
Related papers
- First-explore, Then Exploit: Meta-learning To Solve Hard Exploration-exploitation Trade-offs (2023)0.00
- Learning Efficient And Effective Exploration Policies With Counterfactual Meta Policy (2019)0.00
- Exploration In Approximate Hyper-state Space For Meta Reinforcement Learning (2020)0.00
- Learning To Explore With Meta-policy Gradient (2018)0.00
- Exploitation Is All You Need... For Exploration (2025)0.00
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00
- Fast Active Learning For Pure Exploration In Reinforcement Learning (2020)0.00
- Information Content Exploration (2023)0.00