\(f\)-policy Gradients: A General Framework For Goal Conditioned RL Using \(f\)-divergences
2023 Β· Siddhant Agarwal, Ishan Durugkar, Peter Stone, et al.
Abstract
Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to sub-optimal policies if the reward is misaligned. Moreover, recent works have demonstrated that effective shaping rewards for a particular problem can depend on the underlying learning algorithm. This paper introduces a novel way to encourage exploration called \(f\)-Policy Gradients, or \(f\)-PG. \(f\)-PG minimizes the f-divergence between the agent's state visitation distribution and the goal, which we show can lead to an optimal policy. We derive gradients for various f-divergences to optimize this objective. Our learning paradigm provides dense learning signals for exploration in sparse reward settings. We further introduce an entropy-regularized policy optimization object
Authors
(none)
Tags
Stats
Related papers
- The \(f\)-divergence Reinforcement Learning Framework (2021)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Variational Policy Gradient Method For Reinforcement Learning With General Utilities (2020)0.00
- F-divergence Constrained Policy Improvement (2017)0.00
- Factored Policy Gradients: Leveraging Structure For Efficient Learning In Momdps (2021)0.00
- Global Convergence Guarantees For Federated Policy Gradient Methods With Adversaries (2024)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Policy Gradient For Reinforcement Learning With General Utilities (2022)0.00