DQN-TAMER: Human-in-the-loop Reinforcement Learning With Intractable Feedback
2018 Β· Riku Arakawa, Sosuke Kobayashi, Yuya Unno, et al.
Abstract
Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loo
Authors
(none)
Tags
Stats
Related papers
- Deep TAMER: Interactive Agent Shaping In High-dimensional State Spaces (2017)14.73
- Facial Feedback For Reinforcement Learning: A Case Study And Offline Analysis Using The TAMER Framework (2020)9.59
- Explore, Exploit Or Listen: Combining Human Feedback And Policy Model To Speed Up Deep Reinforcement Learning In 3D Worlds (2017)0.00
- Improving Multimodal Interactive Agents With Reinforcement Learning From Human Feedback (2022)0.00
- A Survey On Enhancing Reinforcement Learning In Complex Environments: Insights From Human And LLM Feedback (2024)0.00
- Mapping Out The Space Of Human Feedback For Reinforcement Learning: A Conceptual Framework (2024)0.00
- Reward Learning From Human Preferences And Demonstrations In Atari (2018)0.00
- Temporal Difference Models: Model-free Deep RL For Model-based Control (2018)0.00