Policy Search Using Dynamic Mirror Descent MPC For Model Free Off Policy RL
2021 Β· Soumya Rani Samineni
Abstract
Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two loops are proposed, where the Dynamic Mirror Descent based Model Predictive Control (DMD-MPC) is used as the inner loop to obtain an optimal sequence of actions. These actions are in turn used to significantly accelerate the outer loop Mf-RL. We show that our formulation is generic for a broad class of MPC based policies and objectives, and includes some of the well-known Mb-Mf approaches. Based on the framework we define two algorithms to increase sample efficiency of Off Policy RL and to guide end to end RL algorithms for online adaption respectively. Thus we finally introduce two novel algo
Authors
(none)
Tags
Stats
Related papers
- Mirror Learning: A Unifying Framework Of Policy Optimisation (2022)0.00
- Model Predictive Control And Reinforcement Learning: A Unified Framework Based On Dynamic Programming (2024)10.61
- Mirror Descent Policy Optimisation For Robust Constrained Markov Decision Processes (2025)0.00
- A Novel Framework For Policy Mirror Descent With General Parameterization And Linear Convergence (2023)2.26
- Scalable Offline Reinforcement Learning For Mean Field Games (2024)0.00
- Policy Mirror Descent With Temporal Difference Learning: Sample Complexity Under Online Markov Data (2025)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Policy Gradient For Robust Markov Decision Processes (2024)0.00