Learning Adaptive Exploration Strategies In Dynamic Environments Through Informed Policy Regularization
2020 Β· Pierre-Alexandre Kamienny, Matteo Pirotta, Alessandro Lazaric, et al.
Abstract
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments, where the task may change over time. While RNN-based policies could in principle represent such strategies, in practice their training time is prohibitive and the learning process often converges to poor solutions. In this paper, we consider the case where the agent has access to a description of the task (e.g., a task id or task parameters) at training time, but not at test time. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task. This dramatically reduces the sample complexity of training RNN-based policies, without losing their representational power. As a result, our method learns exploration strategies that efficiently balance between gathering information about the unknown and changing task and maximizing the reward over time. We test the performance of our algorithm
Authors
(none)
Tags
Stats
Related papers
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Reactive Exploration To Cope With Non-stationarity In Lifelong Reinforcement Learning (2022)0.00
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00
- Nadpex: An On-policy Temporally Consistent Exploration Method For Deep Reinforcement Learning (2018)0.00
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00
- Dynamic Subgoal-based Exploration Via Bayesian Optimization (2019)0.00
- EPO: Entropy-regularized Policy Optimization For LLM Agents Reinforcement Learning (2025)0.00
- Learning Efficient And Effective Exploration Policies With Counterfactual Meta Policy (2019)0.00