Multimodal Reward Shaping For Efficient Exploration In Reinforcement Learning
2021 Β· Mingqi Yuan, Mon-On Pun, Dong Wang, et al.
Abstract
Maintaining the long-term exploration capability of the agent remains one of the critical challenges in deep reinforcement learning. A representative solution is to leverage reward shaping to provide intrinsic rewards for the agent to encourage exploration. However, most existing methods suffer from vanishing intrinsic rewards, which cannot provide sustainable exploration incentives. Moreover, they rely heavily on complex models and additional memory to record learning procedures, resulting in high computational complexity and low robustness. To tackle this problem, entropy-based methods are proposed to evaluate the global exploration performance, encouraging the agent to visit the state space more equitably. However, the sample complexity of estimating the state visitation entropy is prohibitive when handling environments with high-dimensional observations. In this paper, we introduce a novel metric entitled Jain's fairness index (JFI) to replace the entropy regularizer, which solves
Authors
(none)
Tags
Stats
Related papers
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- R\'enyi State Entropy For Exploration Acceleration In Reinforcement Learning (2022)0.00
- Unpacking Reward Shaping: Understanding The Benefits Of Reward Engineering On Sample Complexity (2022)4.52
- Information Content Exploration (2023)0.00
- Maximum Entropy Exploration Without The Rollouts (2026)0.00
- Never Explore Repeatedly In Multi-agent Reinforcement Learning (2023)0.00
- Long-term Visitation Value For Deep Exploration In Sparse Reward Reinforcement Learning (2020)7.24
- BAMDP Shaping: A Unified Framework For Intrinsic Motivation And Reward Shaping (2024)0.00