Maximum Entropy Exploration Without The Rollouts
2026 Β· Jacob Adamczyk, Adam Kamoski, Rahul V. Kulkarni
Abstract
Efficient exploration remains a central challenge in reinforcement learning, serving as a useful pretraining objective for data collection, particularly when an external reward function is unavailable. A principled formulation of the exploration problem is to find policies that maximize the entropy of their induced steady-state visitation distribution, thereby encouraging uniform long-run coverage of the state space. Many existing exploration approaches require estimating state visitation frequencies through repeated on-policy rollouts, which can be computationally expensive. In this work, we instead consider an intrinsic average-reward formulation in which the reward is derived from the visitation distribution itself, so that the optimal policy maximizes steady-state entropy. An entropy-regularized version of this objective admits a spectral characterization: the relevant stationary distributions can be computed from the dominant eigenvectors of a problem-dependent transition matrix.
Authors
(none)
Tags
Stats
Related papers
- Fast Rates For Maximum Entropy Exploration (2023)0.00
- Maximum-entropy Exploration With Future State-action Visitation Measures (2026)0.00
- Off-policy Maximum Entropy RL With Future State And Action Visitation Measures (2024)0.00
- Task-agnostic Exploration Via Policy Gradient Of A Non-parametric State Entropy Estimate (2020)0.00
- Accelerating Reinforcement Learning With Value-conditional State Entropy Exploration (2023)0.00
- Provably Efficient Maximum Entropy Exploration (2018)0.00
- The Importance Of Non-markovianity In Maximum State Entropy Exploration (2022)0.00
- R\'enyi State Entropy For Exploration Acceleration In Reinforcement Learning (2022)0.00