MEPG: A Minimalist Ensemble Policy Gradient Framework For Deep Reinforcement Learning
2021 Β· Qiang He, Huangyuan Su, Chen Gong, et al.
Abstract
During the training of a reinforcement learning (RL) agent, the distribution of training data is non-stationary as the agent's behavior changes over time. Therefore, there is a risk that the agent is overspecialized to a particular distribution and its performance suffers in the larger picture. Ensemble RL can mitigate this issue by learning a robust policy. However, it suffers from heavy computational resource consumption due to the newly introduced value and policy functions. In this paper, to avoid the notorious resources consumption issue, we design a novel and simple ensemble deep RL framework that integrates multiple models into a single model. Specifically, we propose the \underline\{M\}inimalist \underline\{E\}nsemble \underline\{P\}olicy \underline\{G\}radient framework (MEPG), which introduces minimalist ensemble consistent Bellman update by utilizing a modified dropout operator. MEPG holds ensemble property by keeping the dropout consistency of both sides of the Bellman equa
Authors
(none)
Tags
Stats
Related papers
- Mixed Policy Gradient: Off-policy Reinforcement Learning Driven Jointly By Data And Model (2021)0.00
- Merging Deterministic Policy Gradient Estimations With Varied Bias-variance Tradeoff For Effective Deep Reinforcement Learning (2019)0.00
- Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective (2024)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00
- SEERL: Sample Efficient Ensemble Reinforcement Learning (2020)2.26
- Probabilistic Mixture-of-experts For Efficient Deep Reinforcement Learning (2021)0.00
- Evolution-guided Policy Gradient In Reinforcement Learning (2018)0.00