Accelerating Reinforcement Learning With Value-conditional State Entropy Exploration
2023 Β· Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, et al.
Abstract
A promising technique for exploration is to maximize the entropy of visited state distribution, i.e., state entropy, by encouraging uniform coverage of visited state space. While it has been effective for an unsupervised setup, it tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states to exploit the task reward. Such a preference can cause an imbalance between the distributions of high-value states and low-value states, which biases exploration towards low-value state regions as a result of the state entropy increasing when the distribution becomes more uniform. This issue is exacerbated when high-value states are narrowly distributed within the state space, making it difficult for the agent to complete the tasks. In this paper, we present a novel exploration technique that maximizes the value-conditional state entropy, which separately estimates the state entropies that are conditioned on the value estimates of each state, then ma
Authors
(none)
Tags
Stats
Related papers
- Maximum-entropy Exploration With Future State-action Visitation Measures (2026)0.00
- Maximum Entropy Exploration Without The Rollouts (2026)0.00
- R\'enyi State Entropy For Exploration Acceleration In Reinforcement Learning (2022)0.00
- Fast Rates For Maximum Entropy Exploration (2023)0.00
- VDSC: Enhancing Exploration Timing With Value Discrepancy And State Counts (2024)0.00
- Task-agnostic Exploration Via Policy Gradient Of A Non-parametric State Entropy Estimate (2020)0.00
- The Importance Of Non-markovianity In Maximum State Entropy Exploration (2022)0.00
- Learning-driven Exploration For Reinforcement Learning (2019)6.45