Mutual-information Regularization In Markov Decision Processes And Actor-critic Learning
2019 Β· Felix Leibfried, Jordi Grau-Moya
Abstract
Cumulative entropy regularization introduces a regulatory signal to the reinforcement learning (RL) problem that encourages policies with high-entropy actions, which is equivalent to enforcing small deviations from a uniform reference marginal policy. This has been shown to improve exploration and robustness, and it tackles the value overestimation problem. It also leads to a significant performance increase in tabular and high-dimensional settings, as demonstrated via algorithms such as soft Q-learning (SQL) and soft actor-critic (SAC). Cumulative entropy regularization has been extended to optimize over the reference marginal policy instead of keeping it fixed, yielding a regularization that minimizes the mutual information between states and actions. While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with d
Authors
(none)
Tags
Stats
Related papers
- Entropy Regularized Reinforcement Learning Using Large Deviation Theory (2021)6.34
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- A Regularized Approach To Sparse Optimal Policy In Reinforcement Learning (2019)0.00
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Mirror Descent Actor Critic Via Bounded Advantage Learning (2025)0.00
- Divergence-regularized Multi-agent Actor-critic (2021)0.00
- Diversity Actor-critic: Sample-aware Entropy Regularization For Sample-efficient Exploration (2020)0.00
- Temporal Regularization In Markov Decision Process (2018)0.00