Temporal Difference Models: Model-free Deep RL For Model-based Control
2018 Β· Vitchyr Pong, Shixiang Gu, Murtaza Dalal, et al.
Abstract
Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information contained in state transition tuples. Model-based RL uses this information, by training a predictive model, but often does not achieve the same asymptotic performance as model-free RL due to model bias. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based
Authors
(none)
Tags
Stats
Related papers
- MM-KTD: Multiple Model Kalman Temporal Differences For Reinforcement Learning (2020)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Learning Sparse Representations In Reinforcement Learning (2019)0.00
- TD Or Not TD: Analyzing The Role Of Temporal Differencing In Deep Reinforcement Learning (2018)0.00
- Prediction And Control In Continual Reinforcement Learning (2023)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Simplifying Deep Temporal Difference Learning (2024)0.00
- Time-aware Q-networks: Resolving Temporal Irregularity For Deep Reinforcement Learning (2021)0.00