Model Predictive Control And Reinforcement Learning: A Unified Framework Based On Dynamic Programming
2024 Β· Dimitri P. Bertsekas
Abstract
In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Signific
Authors
(none)
Tags
Stats
Related papers
- Driving Reinforcement Learning With Models (2019)0.00
- Policy Search Using Dynamic Mirror Descent MPC For Model Free Off Policy RL (2021)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Online Reinforcement Learning Control By Direct Heuristic Dynamic Programming: From Time-driven To Event-driven (2020)0.00
- Towards An Adaptable And Generalizable Optimization Engine In Decision And Control: A Meta Reinforcement Learning Approach (2024)0.00
- Unified Algorithms For RL With Decision-estimation Coefficients: PAC, Reward-free, Preference-based Learning, And Beyond (2022)5.24
- Control-optimized Deep Reinforcement Learning For Artificially Intelligent Autonomous Systems (2025)0.00
- A Two-timescale Primal-dual Framework For Reinforcement Learning Via Online Dual Variable Guidance (2025)0.00