Enhancing Q-value Updates In Deep Q-learning Via Successor-state Prediction
2025 Β· Lipeng Zu, Hansong Zhou, Xiaonan Zhang
Abstract
Deep Q-Networks (DQNs) estimate future returns by learning from transitions sampled from a replay buffer. However, the target updates in DQN often rely on next states generated by actions from past, potentially suboptimal, policy. As a result, these states may not provide informative learning signals, causing high variance into the update process. This issue is exacerbated when the sampled transitions are poorly aligned with the agent's current policy. To address this limitation, we propose the Successor-state Aggregation Deep Q-Network (SADQ), which explicitly models environment dynamics using a stochastic transition model. SADQ integrates successor-state distributions into the Q-value estimation process, enabling more stable and policy-aligned value updates. Additionally, it explores a more efficient action selection strategy with the modeled transition structure. We provide theoretical guarantees that SADQ maintains unbiased value estimates while reducing training variance. Our exte
Authors
(none)
Tags
Stats
Related papers
- Estimating Q(s,s') With Deep Deterministic Dynamics Gradients (2020)0.00
- An Adaptive Synchronization Approach For Weights Of Deep Reinforcement Learning (2020)0.00
- A Theoretical Analysis Of Deep Q-learning (2019)0.00
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- Seizing Serendipity: Exploiting The Value Of Past Success In Off-policy Actor-critic (2023)0.00
- Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-values (2024)0.00
- Deep Q-learning: Theoretical Insights From An Asymptotic Analysis (2020)10.35
- Mitigating Suboptimality Of Deterministic Policy Gradients In Complex Q-functions (2024)0.00