Dropout Q-functions For Doubly Efficient Reinforcement Learning
2021 Β· Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, et al.
Abstract
Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al., 2018a). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called DroQ, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that DroQ is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ, much better computational efficiency than REDQ, and comparable computational efficiency with that of SAC
Authors
(none)
Tags
Stats
Related papers
- Aggressive Q-learning With Ensembles: Achieving Both High Sample Efficiency And High Asymptotic Performance (2021)0.00
- Crossq: Batch Normalization In Deep Reinforcement Learning For Greater Sample Efficiency And Simplicity (2019)0.00
- On The Reduction Of Variance And Overestimation Of Deep Q-learning (2019)0.00
- Finite-time Analysis Of Simultaneous Double Q-learning (2024)0.00
- Sample Dropout: A Simple Yet Effective Variance Reduction Technique In Deep Policy Optimization (2023)0.00
- Simultaneous Double Q-learning With Conservative Advantage Learning For Actor-critic Methods (2022)0.00
- Effective Exploration For Deep Reinforcement Learning Via Bootstrapped Q-ensembles Under Tsallis Entropy Regularization (2018)0.00
- Sampling Efficient Deep Reinforcement Learning Through Preference-guided Stochastic Exploration (2022)8.09