Value Function Approximations Via Kernel Embeddings For No-regret Reinforcement Learning
2020 Β· Sayak Ray Chowdhury, Rafael Oliveira
Abstract
We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting. In many real-world RL environments, the state and action spaces are continuous or very large. Existing approaches establish regret guarantees by either a low-dimensional representation of the stochastic transition model or an approximation of the \(Q\)-functions. However, the understanding of function approximation schemes for state-value functions largely remains missing. In this paper, we propose an online model-based RL algorithm, namely the CME-RL, that learns representations of transition distributions as embeddings in a reproducing kernel Hilbert space while carefully balancing the exploitation-exploration tradeoff. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order \(\tilde\{O\}\big(H\gamma_N\sqrt\{N\}\big)\)\footnote\{ \(\tilde\{O\}(\cdot)\) hides only absolute constant and poly-logarithmic factors.\}, where \(H\) is
Authors
(none)
Tags
Stats
Related papers
- Continuous-time Value Function Approximation In Reproducing Kernel Hilbert Spaces (2018)0.00
- First-order Regret In Reinforcement Learning With Linear Function Approximation: A Robust Estimation Approach (2021)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00
- Prior-dependent Analysis Of Posterior Sampling Reinforcement Learning With Function Approximation (2024)0.00
- \(\sqrt{n}\)-regret For Learning In Markov Decision Processes With Function Approximation And Low Bellman Rank (2019)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Reinforcement Learning In Feature Space: Matrix Bandit, Kernels, And Regret Bound (2019)0.00