Rethinking Model-based, Policy-based, And Value-based Reinforcement Learning Via The Lens Of Representation Complexity

Abstract

Reinforcement Learning (RL) encompasses diverse paradigms, including model-based RL, policy-based RL, and value-based RL, each tailored to approximate the model, optimal policy, and optimal value function, respectively. This work investigates the potential hierarchy of representation complexity -- the complexity of functions to be represented -- among these RL paradigms. We first demonstrate that, for a broad class of Markov decision processes (MDPs), the model can be represented by constant-depth circuits with polynomial size or Multi-Layer Perceptrons (MLPs) with constant layers and polynomial hidden dimension. However, the representation of the optimal policy and optimal value proves to be \(\mathsf\{NP\}\)-complete and unattainable by constant-layer MLPs with polynomial size. This demonstrates a significant representation complexity gap between model-based RL and model-free RL, which includes policy-based RL and value-based RL. To further explore the representation complexity hiera

Rethinking Model-based, Policy-based, And Value-based Reinforcement Learning Via The Lens Of Representation Complexity

Abstract

Authors

Tags

Stats

Related papers