Asymptotic Analysis Of Sample-averaged Q-learning
2024 Β· Saunak Kumar Panda, Ruiqi Liu, Yisha Xiang
Abstract
Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gr
Authors
(none)
Tags
Stats
Related papers
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- Sample Complexity Of Average-reward Q-learning: From Single-agent To Federated Reinforcement Learning (2026)0.00
- Constant Stepsize Q-learning: Distributional Convergence, Bias And Extrapolation (2024)0.00
- Finite Sample Analysis Of Two-timescale Stochastic Approximation With Applications To Reinforcement Learning (2017)0.00
- Sample Complexity Of Asynchronous Q-learning: Sharper Analysis And Variance Reduction (2020)11.19
- Aggressive Q-learning With Ensembles: Achieving Both High Sample Efficiency And High Asymptotic Performance (2021)0.00
- Averaged-dqn: Variance Reduction And Stabilization For Deep Reinforcement Learning (2016)0.00
- From Set Convergence To Pointwise Convergence: Finite-time Guarantees For Average-reward Q-learning With Adaptive Stepsizes (2025)0.00