Understanding The Origin Of Information-seeking Exploration In Probabilistic Objectives For Control
2021 Β· Beren Millidge, Anil Seth, Christopher Buckley
Abstract
The exploration-exploitation trade-off is central to the description of adaptive behaviour in fields ranging from machine learning, to biology, to economics. While many approaches have been taken, one approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive' which is often implemented in terms of maximizing the agents information gain about the world -- an approach which has been widely studied in machine learning and cognitive science. In this paper we mathematically investigate the nature and meaning of such approaches and demonstrate that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives we call divergence objectives. We propose a dichotomy in the objective functions underlying adaptive behaviour between *evidence* objectives, which correspond to well-known reward or utility maximizing objectives in the literature, and *divergen
Authors
(none)
Tags
Stats
Related papers
- The Exploration-exploitation Dilemma Revisited: An Entropy Perspective (2024)0.00
- Information Is Power: Intrinsic Control Via Information Capture (2021)0.00
- Action And Perception As Divergence Minimization (2020)0.00
- Behind The Myth Of Exploration In Policy Gradients (2024)0.00
- Exploitation Is All You Need... For Exploration (2025)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Learning Controllable Dynamics Through Informative Exploration (2025)0.00
- Intrinsic Rewards For Exploration Without Harm From Observational Noise: A Simulation Study Based On The Free Energy Principle (2024)0.00