What Can Learned Intrinsic Rewards Capture?
2019 Β· Zeyu Zheng, Junhyuk Oh, Matteo Hessel, et al.
Abstract
The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.
Authors
(none)
Tags
Stats
Related papers
- On Learning Intrinsic Rewards For Policy Gradient Methods (2018)0.00
- Adapting Behaviour Via Intrinsic Reward: A Survey And Empirical Study (2019)0.00
- Black Box Meta-learning Intrinsic Rewards (2024)0.00
- Rlexplore: Accelerating Research In Intrinsically-motivated Reinforcement Learning (2024)5.33
- Redeeming Intrinsic Rewards Via Constrained Optimization (2022)0.00
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00
- The Impact Of Intrinsic Rewards On Exploration In Reinforcement Learning (2025)0.00
- Information Content Exploration (2023)0.00