Efficient Off-policy Learning For High-dimensional Action Spaces
2024 Β· Fabian Otto, Philipp Becker, Ngo Anh Vien, et al.
Abstract
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation, which can be problematic in high-dimensional action spaces due to the curse of dimensionality. This reliance results in data inefficiency as maintaining a state-action-value function in such spaces is challenging. We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning. This approach, which we refer to as Vlearn, effectively circumvents the limitations of existing methods by eliminating the necessity for an explicit state-action-value function. To this end, we leverage a weighted importance sampling loss for learning deep value functions from off-policy data. While this is common for linear methods, it has not been combined with deep value function networks. This transfer to deep methods is not straightforward and requires novel design choices such as robust policy updates, twin value func
Authors
(none)
Tags
Stats
Related papers
- Handling Cost And Constraints With Off-policy Deep Reinforcement Learning (2023)0.00
- Learning In Complex Action Spaces Without Policy Gradients (2024)0.00
- Value-consistent Representation Learning For Data-efficient Reinforcement Learning (2022)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- Low-dimensional State And Action Representation Learning With MDP Homomorphism Metrics (2021)0.00
- No Prior Mask: Eliminate Redundant Action For Deep Reinforcement Learning (2023)1.81