Parallel \(q\)-learning: Scaling Off-policy Reinforcement Learning Under Massively Parallel Simulation
2023 Β· Zechu Li, Tao Chen, Zhang-Wei Hong, et al.
Abstract
Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel \(Q\)-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that \(Q\)-learning can be scaled to \textit\{tens of thou
Authors
(none)
Tags
Stats
Related papers
- Accelerated Methods For Deep Reinforcement Learning (2018)0.00
- Preventing Learning Stagnation In PPO By Scaling To 1 Million Parallel Environments (2026)0.00
- Quantile-based Deep Reinforcement Learning Using Two-timescale Policy Gradient Algorithms (2023)0.00
- Efficient Off-policy Reinforcement Learning Via Brain-inspired Computing (2022)8.35
- Multi-timescale Ensemble Q-learning For Markov Decision Process Policy Optimization (2024)6.34
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00
- Quantum-train-based Distributed Multi-agent Reinforcement Learning (2024)7.16
- Projected Off-policy Q-learning (POP-QL) For Stabilizing Offline Reinforcement Learning (2023)0.00