Learning Purposeful Behaviour In The Absence Of Rewards
2016 Β· Marlos C. MacHado, Michael Bowling
Abstract
Artificial intelligence is commonly defined as the ability to achieve goals in the world. In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress. However, some domains have no such reward signal, or have a reward signal so sparse as to appear absent. Without reward feedback, agent behaviour is typically random, often dithering aimlessly and lacking intentionality. In this paper we present an algorithm capable of learning purposeful behaviour in the absence of rewards. The algorithm proceeds by constructing temporally extended actions (options), through the identification of purposes that are "just out of reach" of the agent's current behaviour. These purposes establish intrinsic goals for the agent to learn, ultimately resulting in a suite of behaviours that encourage the agent to visit different parts of the state space. Moreover, the approach is particularly suited for set
Authors
(none)
Tags
Stats
Related papers
- Towards Measuring Goal-directedness In AI Systems (2024)0.00
- What Can Learned Intrinsic Rewards Capture? (2019)0.00
- Goals And The Structure Of Experience (2025)0.00
- Pitfalls Of Learning A Reward Function Online (2020)4.52
- Giving Up Control: Neurons As Reinforcement Learning Agents (2020)0.00
- Evaluating Agents Without Rewards (2020)0.00
- Autotelic Agents With Intrinsically Motivated Goal-conditioned Reinforcement Learning: A Short Survey (2020)0.00
- Provably Feedback-efficient Reinforcement Learning Via Active Reward Learning (2023)0.00