Continuous Control With Contexts, Provably
2019 Β· Simon S. Du, Ruosong Wang, Mengdi Wang, et al.
Abstract
A fundamental challenge in artificial intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes the context of the unseen new environment as input and generates a policy accordingly. The current paper studies how to build a decoder for the fundamental continuous control task, linear quadratic regulator (LQR), which can model a wide range of real-world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a \(\widetilde\{O\}\left(\sqrt\{T\}\right)\) regret bound in the online setting where \(T\) is the number of environments the agent played. This also implies after playing \(\widetilde\{O\}\left(1/\epsilon^2\right)\) environments, the agent is able to transfer the learned knowledge to obtain an \(\epsilon\)-suboptimal policy for
Authors
(none)
Tags
Stats
Related papers
- Learning The Linear Quadratic Regulator From Nonlinear Observations (2020)0.00
- Deep RL With Information Constrained Policies: Generalization In Continuous Control (2020)0.00
- Learning Robust And Adaptive Real-world Continuous Control Using Simulation And Transfer Learning (2018)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- A Tour Of Reinforcement Learning: The View From Continuous Control (2018)19.86
- Attraction-repulsion Actor-critic For Continuous Control Reinforcement Learning (2019)0.00
- Online Reinforcement Learning In Non-stationary Context-driven Environments (2023)0.00
- Solving Continuous Control Via Q-learning (2022)0.00