Max-entropy Reinforcement Learning With Flow Matching And A Case Study On LQR
2025 Β· Yuyang Zhang, Yang Hu, Bo Dai, et al.
Abstract
Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for efficiency, sacrificing the expressiveness and robustness. In this paper, we propose a variant of the SAC algorithm that parameterizes the policy with flow-based models, leveraging their rich expressiveness. In the algorithm, we evaluate the flow-based policy utilizing the instantaneous change-of-variable technique and update the policy with an online variant of flow matching developed in this paper. This online variant, termed importance sampling flow matching (ISFM), enables policy update with only samples from a user-specified sampling distribution rather than the unknown target distribution. We develop a theoretical analysis of ISFM, characterizing how different choices of sampling distributions affect the learning efficiency. Finally, we conduct a case study of our algorithm on the max-entropy li
Authors
(none)
Tags
Stats
Related papers
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- S\(^2\)AC: Energy-based Reinforcement Learning With Stein Soft Actor Critic (2024)2.41
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-critic With Advantage Weighted Mixture Policy(sac-awmp) (2020)0.00
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00
- Improved Soft Actor-critic: Mixing Prioritized Off-policy Samples With On-policy Experience (2021)0.00
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00
- A Diffusion Model Framework For Maximum Entropy Reinforcement Learning (2025)0.00