Generative Planning For Temporally Coordinated Exploration In Reinforcement Learning
2022 Β· Haichao Zhang, Wei Xu, Haonan Yu
Abstract
Standard model-free reinforcement learning algorithms optimize a policy that generates the action to be taken in the current time step in order to maximize expected future return. While flexible, it faces difficulties arising from the inefficient exploration due to its single step nature. In this work, we present Generative Planning method (GPM), which can generate actions not only for the current step, but also for a number of future steps (thus termed as generative planning). This brings several benefits to GPM. Firstly, since GPM is trained by maximizing value, the plans generated from it can be regarded as intentional action sequences for reaching high value regions. GPM can therefore leverage its generated multi-step plans for temporally coordinated exploration towards high value regions, which is potentially more effective than a sequence of actions generated by perturbing each action at single step level, whose consistent movement decays exponentially with the number of explorat
Authors
(none)
Tags
Stats
Related papers
- Off-policy Reinforcement Learning With Model-based Exploration Augmentation (2025)0.00
- Generative Adversarial Exploration For Reinforcement Learning (2022)0.00
- Improved Exploration Through Latent Trajectory Optimization In Deep Deterministic Policy Gradient (2019)0.00
- Phgpo: Pheromone-guided Policy Optimization For Long-horizon Tool Planning (2026)0.00
- Centralized Cooperative Exploration Policy For Continuous Control Tasks (2023)0.00
- Prioritized Guidance For Efficient Multi-agent Reinforcement Learning Exploration (2019)0.00
- Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning (2020)0.00
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00