Efficient Performance Bounds For Primal-dual Reinforcement Learning From Demonstrations
2021 Β· Angeliki Kamoutsi, Goran Banjac, John Lygeros
Abstract
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of implementation, low memory requirements, and computational and sample complexities independent of the number
Authors
(none)
Tags
Stats
Related papers
- Stochastic Primal-dual Methods And Sample Complexity Of Reinforcement Learning (2016)0.00
- Deep Primal-dual Reinforcement Learning: Accelerating Actor-critic Using Bellman Duality (2017)0.00
- Efficient Probabilistic Performance Bounds For Inverse Reinforcement Learning (2017)0.00
- Primal-dual \(\pi\) Learning: Sample Complexity And Sublinear Run Time For Ergodic Markov Decision Problems (2017)0.00
- A Two-timescale Primal-dual Framework For Reinforcement Learning Via Online Dual Variable Guidance (2025)0.00
- A Dual Perspective Of Reinforcement Learning For Imposing Policy Constraints (2024)0.00
- State Augmented Constrained Reinforcement Learning: Overcoming The Limitations Of Learning With Rewards (2021)0.00
- Pretraining Deep Actor-critic Reinforcement Learning Algorithms With Expert Demonstrations (2018)0.00