Flowpg: Action-constrained Policy Gradient With Normalizing Flows
2024 Β· Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar
Abstract
Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learn
Authors
(none)
Tags
Stats
Related papers
- Policyflow: Policy Optimization With Continuous Normalizing Flow In Reinforcement Learning (2026)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00
- Guided Flow Policy: Learning From High-value Actions In Offline Reinforcement Learning (2025)0.00
- Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes (2025)0.00
- Last-iterate Global Convergence Of Policy Gradients For Constrained Reinforcement Learning (2024)0.00
- Reverse Flow Matching: A Unified Framework For Online Reinforcement Learning With Diffusion And Flow Policies (2026)0.00
- Marginal Policy Gradients: A Unified Family Of Estimators For Bounded Action Spaces With Applications (2018)0.00