Learnable Behavior Control: Breaking Atari Human World Records Via Sample-efficient Behavior Selection
2023 Β· Jiajun Fan, Yuzheng Zhuang, Yuecheng Liu, et al.
Abstract
The exploration problem is one of the main challenges in deep reinforcement learning (RL). Recent promising works tried to handle the problem with population-based methods, which collect samples with diverse behaviors derived from a population of different exploratory policies. Adaptive policy selection has been adopted for behavior control. However, the behavior selection space is largely limited by the predefined policy population, which further limits behavior diversity. In this paper, we propose a general framework called Learnable Behavioral Control (LBC) to address the limitation, which a) enables a significantly enlarged behavior selection space via formulating a hybrid behavior mapping from all policies; b) constructs a unified learnable process for behavior selection. We introduce LBC into distributed off-policy actor-critic methods and achieve behavior control via optimizing the selection of the behavior mappings with bandit-based meta-controllers. Our agents have achieved 10
Authors
(none)
Tags
Stats
Related papers
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Towards Human-like RL: Taming Non-naturalistic Behavior In Deep RL Via Adaptive Behavioral Costs In 3D Games (2023)0.00
- Data Efficient Training For Reinforcement Learning With Adaptive Behavior Policy Sharing (2020)0.00
- Deep Exploration With Pac-bayes (2024)0.00
- Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based On A Latent Space Objective (2024)0.00
- Model-based Reinforcement Learning For Atari (2019)0.00
- A Human Mixed Strategy Approach To Deep Reinforcement Learning (2018)7.50
- Adapting Behaviour For Learning Progress (2019)0.00