Multi-timescale Ensemble Q-learning For Markov Decision Process Policy Optimization
2024 Β· Talha Bozkus, Urbashi Mitra
Abstract
Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original Q-learning suffers from performance and complexity challenges across very large networks. Herein, a novel model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges for networks which admit Markov decision process (MDP) models. Multiple Q-learning algorithms are run on multiple, distinct, synthetically created and structurally related Markovian environments in parallel; the outputs are fused using an adaptive weighting mechanism based on the Jensen-Shannon divergence (JSD) to obtain an approximately optimal policy with low complexity. The theoretical justification of the algorithm, including the convergence of key statistics and Q-functions are provided. Numerical results across several network models show that the proposed algorithm can achieve up to 55% less average policy
Authors
(none)
Tags
Stats
Related papers
- Scalable Spectral Representations For Multi-agent Reinforcement Learning In Network Mdps (2024)0.00
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00
- Quantile-based Deep Reinforcement Learning Using Two-timescale Policy Gradient Algorithms (2023)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Coverage Analysis Of Multi-environment Q-learning Algorithms For Wireless Network Optimization (2024)0.00
- Intrinsically Motivated Hierarchical Policy Learning In Multi-objective Markov Decision Processes (2023)4.52
- Towards Applicable Reinforcement Learning: Improving The Generalization And Sample Efficiency With Policy Ensemble (2022)9.23
- Decentralised Q-learning For Multi-agent Markov Decision Processes With A Satisfiability Criterion (2023)0.00