Learning The Linear Quadratic Regulator From Nonlinear Observations
2020 Β· Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, et al.
Abstract
We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonli
Authors
(none)
Tags
Stats
Related papers
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)8.60
- Cost-driven Representation Learning For Linear Quadratic Gaussian Control: Part I (2022)0.00
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19
- Finite-time Analysis Of Approximate Policy Iteration For The Linear Quadratic Regulator (2019)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Fast Policy Learning For Linear Quadratic Control With Entropy Regularization (2023)0.00