Q-learning With Shift-aware Upper Confidence Bound In Non-stationary Reinforcement Learning
2025 Β· Ha Manh Bui, Felix Parker, Kimia Ghobadi, et al.
Abstract
We study the Non-Stationary Reinforcement Learning (RL) under distribution shifts in both finite-horizon episodic and infinite-horizon discounted Markov Decision Processes (MDPs). In the finite-horizon case, the transition functions may suddenly change at a particular episode. In the infinite-horizon setting, such changes can occur at an arbitrary time step during the agent's interaction with the environment. While the Q-learning Upper Confidence Bound algorithm (QUCB) can discover a proper policy during learning, due to the distribution shifts, this policy can exploit sub-optimal rewards after the shift happens. To address this issue, we propose Density-QUCB (DQUCB), a shift-aware Q-learning UCB algorithm, which uses a transition density function to detect distribution shifts, then leverages its likelihood to enhance the uncertainty estimation quality of Q-learning UCB, resulting in a balance between exploration and exploitation. Theoretically, we prove that our oracle DQUCB achieves
Authors
(none)
Tags
Stats
Related papers
- Q-distribution Guided Q-learning For Offline Reinforcement Learning: Uncertainty Penalized Q-value Via Consistency Model (2024)0.00
- Non-stationary Reinforcement Learning: The Blessing Of (more) Optimism (2019)0.00
- Mitigating Distribution Shift In Model-based Offline RL Via Shifts-aware Reward Learning (2024)0.00
- Reinforcement Learning For Non-stationary Markov Decision Processes: The Blessing Of (more) Optimism (2020)0.00
- A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation (2023)0.00
- Q-learning With UCB Exploration Is Sample Efficient For Infinite-horizon MDP (2019)0.00
- Tightening Exploration In Upper Confidence Reinforcement Learning (2020)0.00
- Utilizing Maximum Mean Discrepancy Barycenter For Propagating The Uncertainty Of Value Functions In Reinforcement Learning (2024)0.00