Beyond Noisy-tvs: Noise-robust Exploration Via Learning Progress Monitoring
2025 Β· Zhibo Hou, Zhiyu An, Wan Du
Abstract
When there exists an unlearnable source of randomness (noisy-TV) in the environment, a naively intrinsic reward driven exploring agent gets stuck at that source of randomness and fails at exploration. Intrinsic reward based on uncertainty estimation or distribution similarity, while eventually escapes noisy-TVs as time unfolds, suffers from poor sample efficiency and high computational cost. Inspired by recent findings from neuroscience that humans monitor their improvements during exploration, we propose a novel method for intrinsically-motivated exploration, named Learning Progress Monitoring (LPM). During exploration, LPM rewards model improvements instead of prediction error or novelty, effectively rewards the agent for observing learnable transitions rather than the unlearnable transitions. We introduce a dual-network design that uses an error model to predict the expected prediction error of the dynamics model in its previous iteration, and use the difference between the model er
Authors
(none)
Tags
Stats
Related papers
- How To Stay Curious While Avoiding Noisy Tvs Using Aleatoric Uncertainty Estimation (2021)0.00
- Noisy Networks For Exploration (2017)0.00
- Adaptive Symmetric Reward Noising For Reinforcement Learning (2019)0.00
- Intrinsic Rewards For Exploration Without Harm From Observational Noise: A Simulation Study Based On The Free Energy Principle (2024)0.00
- A Temporally Correlated Latent Exploration For Reinforcement Learning (2024)0.00
- Self-supervised Exploration Via Temporal Inconsistency In Reinforcement Learning (2022)3.58
- Information Content Exploration (2023)0.00
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00