Intrinsic Rewards From Self-organizing Feature Maps For Exploration In Reinforcement Learning
2023 Β· Marius Lindegaard, Hjalmar Jacob Vinje, Odin Aleksander Severinsen
Abstract
We introduce an exploration bonus for deep reinforcement learning methods calculated using self-organising feature maps. Our method uses adaptive resonance theory (ART) providing online, unsupervised clustering to quantify the novelty of a state. This heuristic is used to add an intrinsic reward to the extrinsic reward signal for then to optimize the agent to maximize the sum of these two rewards. We find that this method was able to play the game Ordeal at a human level after a comparable number of training epochs to ICM arXiv:1705.05464. Agents augmented with RND arXiv:1810.12894 were unable to achieve the same level of performance in our space of hyperparameters.
Authors
(none)
Tags
Stats
Related papers
- Never Explore Repeatedly In Multi-agent Reinforcement Learning (2023)0.00
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00
- Information Content Exploration (2023)0.00
- Self-supervised Exploration Via Temporal Inconsistency In Reinforcement Learning (2022)3.58
- Individual Contributions As Intrinsic Exploration Scaffolds For Multi-agent Reinforcement Learning (2024)2.80
- Redeeming Intrinsic Rewards Via Constrained Optimization (2022)0.00
- Intrinsic Rewards For Exploration Without Harm From Observational Noise: A Simulation Study Based On The Free Energy Principle (2024)0.00
- Rewarding Episodic Visitation Discrepancy For Exploration In Reinforcement Learning (2022)0.00