Exploitation Is All You Need... For Exploration
2025 Β· Micah Rentschler, Jesse Roberts
Abstract
Ensuring sufficient exploration is a central challenge when training meta-reinforcement learning (meta-RL) agents to solve novel environments. Conventional solutions to the exploration-exploitation dilemma inject explicit incentives such as randomization, uncertainty bonuses, or intrinsic rewards to encourage exploration. In this work, we hypothesize that an agent trained solely to maximize a greedy (exploitation-only) objective can nonetheless exhibit emergent exploratory behavior, provided three conditions are met: (1) Recurring Environmental Structure, where the environment features repeatable regularities that allow past experience to inform future choices; (2) Agent Memory, enabling the agent to retain and utilize historical interaction data; and (3) Long-Horizon Credit Assignment, where learning propagates returns over a time frame sufficient for the delayed benefits of exploration to inform current decisions. Through experiments in stochastic multi-armed bandits and temporally e
Authors
(none)
Tags
Stats
Related papers
- First-explore, Then Exploit: Meta-learning To Solve Hard Exploration-exploitation Trade-offs (2023)0.00
- Decoupling Exploration And Exploitation For Meta-reinforcement Learning Without Sacrifices (2020)0.00
- Exploration And Incentives In Reinforcement Learning (2021)8.09
- MULEX: Disentangling Exploitation From Exploration In Deep RL (2019)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Fast Active Learning For Pure Exploration In Reinforcement Learning (2020)0.00
- Exploration In Feature Space For Reinforcement Learning (2017)0.00
- The Exploration-exploitation Dilemma Revisited: An Entropy Perspective (2024)0.00