Off-policy Deep Reinforcement Learning With Analogous Disentangled Exploration
2020 Β· Anji Liu, Yitao Liang, Guy van Den Broeck
Abstract
Off-policy reinforcement learning (RL) is concerned with learning a rewarding policy by executing another policy that gathers samples of experience. While the former policy (i.e. target policy) is rewarding but in-expressive (in most cases, deterministic), doing well in the latter task, in contrast, requires an expressive policy (i.e. behavior policy) that offers guided and effective exploration. Contrary to most methods that make a trade-off between optimality and expressiveness, disentangled frameworks explicitly decouple the two objectives, which each is dealt with by a distinct separate policy. Although being able to freely design and optimize the two policies with respect to their own objectives, naively disentangling them can lead to inefficient learning or stability issues. To mitigate this problem, our proposed method Analogous Disentangled Actor-Critic (ADAC) designs analogous pairs of actors and critics. Specifically, ADAC leverages a key property about Stein variational grad
Authors
(none)
Tags
Stats
Related papers
- Efficient Reinforcement Learning Via Decoupling Exploration And Utilization (2023)2.56
- MULEX: Disentangling Exploitation From Exploration In Deep RL (2019)0.00
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00
- Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning (2021)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Adversarial Policies: Attacking Deep Reinforcement Learning (2019)0.00
- "so, Tell Me About Your Policy...": Distillation Of Interpretable Policies From Deep Reinforcement Learning Agents (2025)0.00
- Off-policy Reinforcement Learning With Model-based Exploration Augmentation (2025)0.00