Factored Policy Gradients: Leveraging Structure For Efficient Learning In Momdps
2021 Β· Thomas Spooner, Nelson Vadori, Sumitra Ganesh
Abstract
Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically. In this paper, we address this problem through a factor baseline which exploits independence structure encoded in a novel action-target influence network. Factored policy gradients (FPGs), which follow, provide a common framework for analysing key state-of-the-art algorithms, are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes. We provide an analysis of the proposed estimator and identify the conditions under which variance is reduced. The algorithmic aspects of FPGs are discussed, including optimal policy factorisation, as characterised by minimum biclique coverings, and the implications for the bias-variance trade-off of incorrectly sp
Authors
(none)
Tags
Stats
Related papers
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00
- Optimal Estimation Of Off-policy Policy Gradient Via Double Fitted Iteration (2022)0.00
- \(f\)-policy Gradients: A General Framework For Goal Conditioned RL Using \(f\)-divergences (2023)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- FACMAC: Factored Multi-agent Centralised Policy Gradients (2020)0.00
- Marginal Policy Gradients: A Unified Family Of Estimators For Bounded Action Spaces With Applications (2018)0.00
- Scaling Internal-state Policy-gradient Methods For Pomdps (2025)0.00
- Fourier Policy Gradients (2018)0.00