Do Transformer World Models Give Better Policy Gradients?
2024 Β· Michel Ma, Tianwei Ni, Clement Gehring, et al.
Abstract
A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long horizons: could they be the solution to this problem? Surprisingly, we show that commonly-used transformer world models produce circuitous gradient paths, which can be detrimental to long-range policy gradients. To tackle this challenge, we propose a class of world models called Actions World Models (AWMs), designed to provide more direct routes for gradient propagation. We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent. We demonstrate that AWMs can generate optimiza
Authors
(none)
Tags
Stats
Related papers
- Recurrent World Models Facilitate Policy Evolution (2018)0.00
- PWM: Policy Learning With Multi-task World Models (2024)0.00
- Decentralized Transformers With Centralized Aggregation Are Sample-efficient Multi-agent World Models (2024)0.00
- STORM: Efficient Stochastic Transformer Based World Models For Reinforcement Learning (2023)4.52
- Low-variance Policy Gradient Estimation With World Models (2020)0.00
- Transformers Are Sample-efficient World Models (2022)0.00
- World Models Via Policy-guided Trajectory Diffusion (2023)0.00
- Transformer Based Reinforcement Learning For Games (2019)0.00