Deep Exploration With Pac-bayes
2024 Β· Bahareh Tasdighi, Manuel Haussmann, Nicklas Werge, et al.
Abstract
Reinforcement learning (RL) for continuous control under delayed rewards is an under-explored problem despite its significance in real-world applications. Many complex skills are based on intermediate ones as prerequisites. For instance, a humanoid locomotor must learn how to stand before it can learn to walk. To cope with delayed reward, an agent must perform deep exploration. However, existing deep exploration methods are designed for small discrete action spaces, and their generalization to state-of-the-art continuous control remains unproven. We address the deep exploration problem for the first time from a PAC-Bayesian perspective in the context of actor-critic learning. To do this, we quantify the error of the Bellman operator through a PAC-Bayes bound, where a bootstrapped ensemble of critic networks represents the posterior distribution, and their targets serve as a data-informed function-space prior. We derive an objective function from this bound and use it to train the criti
Authors
(none)
Tags
Stats
Related papers
- Efficient Exploration In Deep Reinforcement Learning: A Novel Bayesian Actor-critic Algorithm (2024)0.00
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Broad Critic Deep Actor Reinforcement Learning For Continuous Control (2024)0.00
- Pac-bayesian Reinforcement Learning Trains Generalizable Policies (2025)0.00
- Guided Exploration In Reinforcement Learning Via Monte Carlo Critic Optimization (2022)0.00
- Deep Intrinsically Motivated Exploration In Continuous Control (2022)0.00
- Solving Continuous Control Via Q-learning (2022)0.00
- Deep Reinforcement Learning With Feedback-based Exploration (2019)5.84