Implicit Bias Of Policy Gradient In Linear Quadratic Control: Extrapolation To Unseen Initial States
2024 Β· Noam Razin, Yotam Alexander, Edo Cohen-Karlik, et al.
Abstract
In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (reinforcement learning). There, learning a controller applied to a system via gradient descent is known as policy gradient, and a question of prime importance is the extent to which a learned controller extrapolates to unseen initial states. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states. Focusing on the fundamental Linear Quadratic Regulator (LQR) problem, we establish that the extent of extrapolation depends on the degree of exploration induced by the system when commencing from initial states included in training. Exper
Authors
(none)
Tags
Stats
Related papers
- Some Remarks On Gradient Dominance And LQR Policy Optimization (2025)0.00
- Full Error Analysis Of Policy Gradient Learning Algorithms For Exploratory Linear Quadratic Mean-field Control Problem In Continuous Time With Common Noise (2024)0.00
- Global Convergence Using Policy Gradient Methods For Model-free Markovian Jump Linear Quadratic Control (2021)0.00
- On The Optimization Landscape Of Dynamic Output Feedback: A Case Study For Linear Quadratic Regulator (2022)4.52
- Learning Robust Control For LQR Systems With Multiplicative Noise Via Policy Gradient (2019)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)8.60
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00