Sampling Efficient Deep Reinforcement Learning Through Preference-guided Stochastic Exploration
2022 Β· Wenhui Huang, Cong Zhang, Jingda Wu, et al.
Abstract
Massive practical works addressed by Deep Q-network (DQN) algorithm have indicated that stochastic policy, despite its simplicity, is the most frequently used exploration approach. However, most existing stochastic exploration approaches either explore new actions heuristically regardless of Q-values or inevitably introduce bias into the learning process to couple the sampling with Q-values. In this paper, we propose a novel preference-guided \(\epsilon\)-greedy exploration algorithm that can efficiently learn the action distribution in line with the landscape of Q-values for DQN without introducing additional bias. Specifically, we design a dual architecture consisting of two branches, one of which is a copy of DQN, namely the Q-branch. The other branch, which we call the preference branch, learns the action preference that the DQN implicit follows. We theoretically prove that the policy improvement theorem holds for the preference-guided \(\epsilon\)-greedy policy and experimentally
Authors
(none)
Tags
Stats
Related papers
- On The Convergence And Sample Complexity Analysis Of Deep Q-networks With \(\epsilon\)-greedy Exploration (2023)3.58
- DQN With Model-based Exploration: Efficient Learning On Environments With Sparse Rewards (2019)0.00
- \(\beta\)-dqn: Improving Deep Q-learning By Evolving The Behavior (2025)0.00
- Careful At Estimation And Bold At Exploration (2023)0.00
- Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning (2021)0.00
- Neighboring State-based Exploration For Reinforcement Learning (2022)0.00
- Improving Exploration In Evolution Strategies For Deep Reinforcement Learning Via A Population Of Novelty-seeking Agents (2017)0.00
- Information-directed Exploration For Deep Reinforcement Learning (2018)0.00