Learning In Complex Action Spaces Without Policy Gradients
2024 · Arash Tavakoli, Sina Ghiassian, Nemanja Rakićević
Abstract
While conventional wisdom holds that policy gradient methods are better suited to complex action spaces than action-value methods, foundational work has shown that the two paradigms are equivalent in small, finite action spaces (O'Donoghue et al., 2017; Schulman et al., 2017a). This raises the question of why their computational applicability and performance diverge as the complexity of the action space increases. We hypothesize that the apparent superiority of policy gradients in such settings stems not from intrinsic qualities of the paradigm but from universal principles that can also be applied to action-value methods, enabling similar functions. We identify three such principles and provide a framework for incorporating them into action-value methods. To support our hypothesis, we instantiate this framework in what we term QMLE, for Q-learning with maximum likelihood estimation. Our results show that QMLE can be applied to complex action spaces at a computational cost comparable t
Authors
(none)
Tags
Stats
Related papers
- Marginal Policy Gradients: A Unified Family Of Estimators For Bounded Action Spaces With Applications (2018)0.00
- Mitigating Suboptimality Of Deterministic Policy Gradients In Complex Q-functions (2024)0.00
- Mixed Q-functionals: Advancing Value-based Methods In Cooperative MARL With Continuous Action Domains (2024)0.00
- Growing Action Spaces (2019)0.00
- Efficient Off-policy Learning For High-dimensional Action Spaces (2024)0.00
- All-action Policy Gradient Methods: A Numerical Integration Approach (2019)0.00
- Identifying Policy Gradient Subspaces (2024)0.00
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00