On Value Functions And The Agent-environment Boundary
2019 Β· Nan Jiang
Abstract
When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent. As a consequence, fundamental concepts in RL, such as (optimal) value functions, are not uniquely defined as they depend on where we draw this agent-environment boundary, causing problems in theoretical analyses that provide optimality guarantees. We address this issue via a simple and novel boundary-invariant analysis of Fitted Q-Iteration, a representative RL algorithm, where the assumptions and the guarantees are invariant to the choice of boundary. We also discuss closely related issues on state resetting and Monte-Carlo Tree Search, deterministic vs stochastic systems, imitation learning, and the verifiability of theoretical assumptions from data.
Authors
(none)
Tags
Stats
Related papers
- The Value Equivalence Principle For Model-based Reinforcement Learning (2020)0.00
- Between Rate-distortion Theory & Value Equivalence In Model-based Reinforcement Learning (2022)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation (2021)0.00
- Simple Agent, Complex Environment: Efficient Reinforcement Learning With Agent States (2021)0.00
- Deciding What To Model: Value-equivalent Sampling For Reinforcement Learning (2022)0.00
- Leveraging Prior Knowledge In Reinforcement Learning Via Double-sided Bounds On The Value Function (2023)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00