Abstract

Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function \(\mathcal\{Z\}\), and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and \(Q\)-functions can be derived from this partition function and interpreted via average energies, the \(\mathcal\{Z\}\)-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approach

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyrahme2019a

Related papers