A Kernel-based Approach To Non-stationary Reinforcement Learning In Metric Spaces
2020 · Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, et al.
Abstract
In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.
Authors
(none)
Tags
Stats
Related papers
- Efficient Learning In Non-stationary Linear Markov Decision Processes (2020)6.77
- A Kernel Perspective On Behavioural Metrics For Markov Decision Processes (2023)0.00
- Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, And Separation Design (2022)3.58
- Adaptive Discretization For Episodic Reinforcement Learning In Metric Spaces (2019)2.26
- Kernel Metric Learning For In-sample Off-policy Evaluation Of Deterministic RL Policies (2024)0.00
- Non-stationary Markov Decision Processes, A Worst-case Approach Using Model-based Reinforcement Learning, Extended Version (2019)0.00
- Square-root Regret Bounds For Continuous-time Episodic Markov Decision Processes (2022)2.26
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26