Minimum-delay Adaptation In Non-stationary Reinforcement Learning Via Online High-confidence Change-point Detection
2021 Β· Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva
Abstract
Non-stationary environments are challenging for reinforcement learning algorithms. If the state transition and/or reward functions change based on latent factors, the agent is effectively tasked with optimizing a behavior that maximizes performance over a possibly infinite random sequence of Markov Decision Processes (MDPs), each of which drawn from some unknown distribution. We call each such MDP a context. Most related works make strong assumptions such as knowledge about the distribution over contexts, the existence of pre-training phases, or a priori knowledge about the number, sequence, or boundaries between contexts. We introduce an algorithm that efficiently learns policies in non-stationary environments. It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics that reflect whether novel, specialized policies need to be created and deployed to tackle novel contexts, or whether previously-optimized ones might be
Authors
(none)
Tags
Stats
Related papers
- A Behavior-aware Approach For Deep Reinforcement Learning In Non-stationary Environments Without Known Change Points (2024)0.00
- Debiased Offline Representation Learning For Fast Online Adaptation In Non-stationary Dynamics (2024)0.00
- Testing Stationarity And Change Point Detection In Reinforcement Learning (2022)0.00
- Online Reinforcement Learning In Non-stationary Context-driven Environments (2023)0.00
- Restarted Bayesian Online Change-point Detection For Non-stationary Markov Decision Processes (2023)0.00
- Learning Adversarial Markov Decision Processes With Delayed Feedback (2020)0.00
- Context-aware Safe Reinforcement Learning For Non-stationary Environments (2021)9.76
- Act As You Learn: Adaptive Decision-making In Non-stationary Markov Decision Processes (2024)0.00