Estimating Q(s,s') With Deep Deterministic Dynamics Gradients
2020 Β· Ashley D. Edwards, Himanshu Sahni, Rosanne Liu, et al.
Abstract
In this paper, we introduce a novel form of value function, \(Q(s, s')\), that expresses the utility of transitioning from a state \(s\) to a neighboring state \(s'\) and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at http://sites.google.com/view/qss-paper.
Authors
(none)
Tags
Stats
Related papers
- Enhancing Q-value Updates In Deep Q-learning Via Successor-state Prediction (2025)0.00
- Mitigating Suboptimality Of Deterministic Policy Gradients In Complex Q-functions (2024)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- Disentangling Dynamics And Returns: Value Function Decomposition With Future Prediction (2019)0.00
- Smoothed Action Value Functions For Learning Gaussian Policies (2018)0.00
- Policy Gradient Using Weak Derivatives For Reinforcement Learning (2020)0.00
- Direct Soft-policy Sampling Via Langevin Dynamics (2026)0.00
- Universal Approximation Theorem Of Deep Q-networks (2025)0.00