A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps
2024 Β· Kihyuk Hong, Ambuj Tewari
Abstract
We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an \(\epsilon\)-optimal policy with \(O(\epsilon^\{-2\})\) sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of \(O(\epsilon^\{-2\})\) with partial data coverage assumption. Our work is an improvement upon a recent work that requires \(O(\epsilon^\{-4\})\) samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.
Authors
(none)
Tags
Stats
Related papers
- Near-optimal Offline Reinforcement Learning Via Double Variance Reduction (2021)0.00
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- A Near-optimal Primal-dual Method For Off-policy Learning In CMDP (2022)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Provably Efficient Primal-dual Reinforcement Learning For Cmdps With Non-stationary Objectives And Constraints (2022)0.00
- Efficient Online Learning With Offline Datasets For Infinite Horizon Mdps: A Bayesian Approach (2023)0.00
- Distributionally Robust Offline Reinforcement Learning With Linear Function Approximation (2022)0.00
- Offline Reinforcement Learning With Realizability And Single-policy Concentrability (2022)0.00