Adaptive Symmetric Reward Noising For Reinforcement Learning
2019 Β· Refael Vivanti, Talya D. Sohlberg-Baris, Shlomo Cohen, et al.
Abstract
Recent reinforcement learning algorithms, though achieving impressive results in various fields, suffer from brittle training effects such as regression in results and high sensitivity to initialization and parameters. We claim that some of the brittleness stems from variance differences, i.e. when different environment areas - states and/or actions - have different rewards variance. This causes two problems: First, the "Boring Areas Trap" in algorithms such as Q-learning, where moving between areas depends on the current area variance, and getting out of a boring area is hard due to its low variance. Second, the "Manipulative Consultant" problem, when value-estimation functions used in DQN and Actor-Critic algorithms influence the agent to prefer boring areas, regardless of the mean rewards return, as they maximize estimation precision rather than rewards. This sheds a new light on how exploration contribute to training, as it helps with both challenges. Cognitive experiments in human
Authors
(none)
Tags
Stats
Related papers
- Action Noise In Off-policy Deep Reinforcement Learning: Impact On Exploration And Performance (2022)0.00
- Disturbing Reinforcement Learning Agents With Corrupted Rewards (2021)0.00
- Reinforcement Learning With Perturbed Rewards (2018)13.74
- Noisy Networks For Exploration (2017)0.00
- Beyond Noisy-tvs: Noise-robust Exploration Via Learning Progress Monitoring (2025)0.00
- How To Stay Curious While Avoiding Noisy Tvs Using Aleatoric Uncertainty Estimation (2021)0.00
- The Distributional Reward Critic Framework For Reinforcement Learning Under Perturbed Rewards (2024)0.00
- Self-supervised Exploration Via Temporal Inconsistency In Reinforcement Learning (2022)3.58