Reinforcement Learning In Switching Non-stationary Markov Decision Processes: Algorithms And Convergence Analysis
2025 · Mohsen Amiri, Sindri Magnússon
Abstract
Reinforcement learning in non-stationary environments is challenging due to abrupt and unpredictable changes in dynamics, often causing traditional algorithms to fail to converge. However, in many real-world cases, non-stationarity has some structure that can be exploited to develop algorithms and facilitate theoretical analysis. We introduce one such structure, Switching Non-Stationary Markov Decision Processes (SNS-MDP), where environments switch over time based on an underlying Markov chain. Under a fixed policy, the value function of an SNS-MDP admits a closed-form solution determined by the Markov chain's statistical properties, and despite the inherent non-stationarity, Temporal Difference (TD) learning methods still converge to the correct value function. Furthermore, policy improvement can be performed, and it is shown that policy iteration converges to the optimal policy. Moreover, since Q-learning converges to the optimal Q-function, it likewise yields the corresponding optim
Authors
(none)
Tags
Stats
Related papers
- Act As You Learn: Adaptive Decision-making In Non-stationary Markov Decision Processes (2024)0.00
- Markov Decision Processes Under External Temporal Processes (2023)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Non-stationary Markov Decision Processes, A Worst-case Approach Using Model-based Reinforcement Learning, Extended Version (2019)0.00
- Safe Reinforcement Learning For Constrained Markov Decision Processes With Stochastic Stopping Time (2024)2.26
- Demystifying Reinforcement Learning In Time-varying Systems (2022)0.00
- On The Convergence Of Policy Gradient Methods To Nash Equilibria In General Stochastic Games (2022)0.00
- Policy Gradient For Continuing Tasks In Non-stationary Markov Decision Processes (2020)0.00