Generative Temporal Difference Learning For Infinite-horizon Prediction
2020 Β· Michael Janner, Igor Mordatch, Sergey Levine
Abstract
We introduce the \(\gamma\)-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with \(\gamma\)-models leads to generalizations of the procedures central to model-based control, including the model rollout and model-based value estimation. The \(\gamma\)-model, trained with a generative reinterpretation of temporal difference learning, is a natural continuous analogue of the successor representation and a hybrid between model-free and model-based mechanisms. Like a value function, it contains information about the long-term future; like a standard predictive model, it is independent of task reward. We instantiate the \(\gamma\)-model as both a generative adversarial network and normalizing flow, discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors, and empirically investigate its utility for prediction and control.
Authors
(none)
Tags
Stats
Related papers
- Temporal Difference Flows (2025)0.00
- Prediction And Control With Temporal Segment Models (2017)0.00
- Robust And Adaptive Temporal-difference Learning Using An Ensemble Of Gaussian Processes (2021)0.00
- Loss Dynamics Of Temporal Difference Reinforcement Learning (2023)0.00
- Gamma-nets: Generalizing Value Estimation Over Timescale (2019)5.84
- Temporal Difference Models: Model-free Deep RL For Model-based Control (2018)0.00
- Generative Models In Decision Making: A Survey (2025)0.00
- Task-agnostic Online Reinforcement Learning With An Infinite Mixture Of Gaussian Processes (2020)0.00