Transformers Can Learn Temporal Difference Methods For In-context Reinforcement Learning
2024 Β· Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, et al.
Abstract
Traditionally, reinforcement learning (RL) agents learn to solve new tasks by updating their neural network parameters through interactions with the task environment. However, recent works demonstrate that some RL agents, after certain pretraining procedures, can learn to solve unseen new tasks without parameter updates, a phenomenon known as in-context reinforcement learning (ICRL). The empirical success of ICRL is widely attributed to the hypothesis that the forward pass of the pretrained agent neural network implements an RL algorithm. In this paper, we support this hypothesis by showing, both empirically and theoretically, that when a transformer is trained for policy evaluation tasks, it can discover and learn to implement temporal difference learning in its forward pass.
Authors
(none)
Tags
Stats
Related papers
- From Memories To Maps: Mechanisms Of In-context Reinforcement Learning In Transformers (2025)0.00
- Transformers As Game Players: Provable In-context Game-playing Capabilities Of Pre-trained Models (2024)0.00
- Emergence Of In-context Reinforcement Learning From Noise Distillation (2023)0.00
- Transformer Based Reinforcement Learning For Games (2019)0.00
- Updet: Universal Multi-agent Reinforcement Learning Via Policy Decoupling With Transformers (2021)0.00
- Self-confirming Transformer For Belief-conditioned Adaptation In Offline Multi-agent Reinforcement Learning (2023)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Learning Sparse Representations In Reinforcement Learning (2019)0.00