Exploration By Random Distribution Distillation
2025 Β· Zhirui Fang, Kai Yang, Jian Tao, et al.
Abstract
Exploration remains a critical challenge in online reinforcement learning, as an agent must effectively explore unknown environments to achieve high returns. Currently, the main exploration algorithms are primarily count-based methods and curiosity-based methods, with prediction-error methods being a prominent example. In this paper, we propose a novel method called \textbf\{R\}andom \textbf\{D\}istribution \textbf\{D\}istillation (RDD), which samples the output of a target network from a normal distribution. RDD facilitates a more extensive exploration by explicitly treating the difference between the prediction network and the target network as an intrinsic reward. Furthermore, by introducing randomness into the output of the target network for a given state and modeling it as a sample from a normal distribution, intrinsic rewards are bounded by two key components: a pseudo-count term ensuring proper exploration decay and a discrepancy term accounting for predictor convergence. We de
Authors
(none)
Tags
Stats
Related papers
- Exploration And Anti-exploration With Distributional Random Network Distillation (2024)2.51
- Random Latent Exploration For Deep Reinforcement Learning (2024)0.00
- Information-directed Exploration For Deep Reinforcement Learning (2018)0.00
- Neighboring State-based Exploration For Reinforcement Learning (2022)0.00
- Exploratory Diffusion Model For Unsupervised Reinforcement Learning (2025)0.00
- Curious Explorer: A Provable Exploration Strategy In Policy Learning (2021)0.00
- Exploring Restart Distributions (2018)0.00
- Maximum Entropy Exploration Without The Rollouts (2026)0.00