Fundamental Limits Of Reinforcement Learning In Environment With Endogeneous And Exogeneous Uncertainty
2021 Β· Rongpeng Li
Abstract
Online reinforcement learning (RL) has been widely applied in information processing scenarios, which usually exhibit much uncertainty due to the intrinsic randomness of channels and service demands. In this paper, we consider an un-discounted RL in general Markov decision processes (MDPs) with both endogeneous and exogeneous uncertainty, where both the rewards and state transition probability are unknown to the RL agent and evolve with the time as long as their respective variations do not exceed certain dynamic budget (i.e., upper bound). We first develop a variation-aware Bernstein-based upper confidence reinforcement learning (VB-UCRL), which we allow to restart according to a schedule dependent on the variations. We successfully overcome the challenges due to the exogeneous uncertainty and establish a regret bound of saving at most \(\sqrt\{S\}\) or \(S^\{\frac\{1\}\{6\}\}T^\{\frac\{1\}\{12\}\}\) compared with the latest results in the literature, where \(S\) denotes the state siz
Authors
(none)
Tags
Stats
Related papers
- Non-stationary Reinforcement Learning: The Blessing Of (more) Optimism (2019)0.00
- Sample-efficient Robust Multi-agent Reinforcement Learning In The Face Of Environmental Uncertainty (2024)0.00
- Smart Exploration In Reinforcement Learning Using Bounded Uncertainty Models (2025)0.00
- Near-optimal Optimistic Reinforcement Learning Using Empirical Bernstein Inequalities (2019)0.00
- Online Robust Reinforcement Learning With Model Uncertainty (2021)0.00
- Reinforcement Learning For Non-stationary Markov Decision Processes: The Blessing Of (more) Optimism (2020)0.00
- Variance-aware Regret Bounds For Undiscounted Reinforcement Learning In Mdps (2018)0.00
- Online Bayesian Risk-averse Reinforcement Learning (2025)0.00