Rate-optimal Policy Optimization For Linear Markov Decision Processes
2023 Β· Uri Sherman, Alon Cohen, Tomer Koren, et al.
Abstract
We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal \(\widetilde O (\sqrt K)\) regret where \(K\) denotes the number of episodes. Our work is the first to establish the optimal (w.r.t.~\(K\)) rate of convergence in the stochastic setting with bandit feedback using a policy optimization based approach, and the first to establish the optimal (w.r.t.~\(K\)) rate in the adversarial setup with full information feedback, for which no algorithm with an optimal rate guarantee is currently known.
Authors
(none)
Tags
Stats
Related papers
- Towards Optimal Regret In Adversarial Linear Mdps With Bandit Feedback (2023)0.00
- Near-optimal Regret Using Policy Optimization In Online Mdps With Aggregate Bandit Feedback (2025)0.00
- Online Markov Decision Processes With Aggregate Bandit Feedback (2021)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Near-optimal Regret For Adversarial MDP With Delayed Bandit Feedback (2022)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00