Solving Continuous Control Via Q-learning
2022 Β· Tim Seyde, Peter Werner, Wilko Schwarting, et al.
Abstract
While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a var
Authors
(none)
Tags
Stats
Related papers
- Attraction-repulsion Actor-critic For Continuous Control Reinforcement Learning (2019)0.00
- Mitigating Off-policy Bias In Actor-critic Methods With One-step Q-learning: A Novel Correction Approach (2022)0.00
- How To Discretize Continuous State-action Spaces In Q-learning: A Symbolic Control Approach (2024)3.58
- Sa-matd3:self-attention-based Multi-agent Continuous Control Method In Cooperative Environments (2021)11.76
- Multi-agent Actor-critic For Mixed Cooperative-competitive Environments (2017)0.00
- Deep Exploration With Pac-bayes (2024)0.00
- Convergence Of Actor-critic Learning For Mean Field Games And Mean Field Control In Continuous Spaces (2025)0.00
- Langevin Soft Actor-critic: Efficient Exploration Through Uncertainty-driven Critic Learning (2025)0.00