Satisficing Exploration For Deep Reinforcement Learning
2024 Β· Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, et al.
Abstract
A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the underlying algorithm relies on model-based planning, drastically limiting the compatibility of these ideas
Authors
(none)
Tags
Stats
Related papers
- Is Exploration Or Optimization The Problem For Deep Reinforcement Learning? (2025)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Minimax-optimal Reward-agnostic Exploration In Reinforcement Learning (2023)0.00
- Modeling Human Exploration Through Resource-rational Reinforcement Learning (2022)2.26
- Exploration And Incentives In Reinforcement Learning (2021)8.09
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Exploitation Is All You Need... For Exploration (2025)0.00
- Computationally Efficient Reinforcement Learning: Targeted Exploration Leveraging Simple Rules (2022)2.26