Value Function Approximations Via Kernel Embeddings For No-regret Reinforcement Learning

Abstract

We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting. In many real-world RL environments, the state and action spaces are continuous or very large. Existing approaches establish regret guarantees by either a low-dimensional representation of the stochastic transition model or an approximation of the \(Q\)-functions. However, the understanding of function approximation schemes for state-value functions largely remains missing. In this paper, we propose an online model-based RL algorithm, namely the CME-RL, that learns representations of transition distributions as embeddings in a reproducing kernel Hilbert space while carefully balancing the exploitation-exploration tradeoff. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order \(\tilde\{O\}\big(H\gamma_N\sqrt\{N\}\big)\)\footnote\{ \(\tilde\{O\}(\cdot)\) hides only absolute constant and poly-logarithmic factors.\}, where \(H\) is

Value Function Approximations Via Kernel Embeddings For No-regret Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers