PWM: Policy Learning With Multi-task World Models
2024 Β· Ignat Georgiev, Varun Giridhar, Nicklas Hansen, et al.
Abstract
Reinforcement Learning (RL) has made significant strides in complex tasks but struggles in multi-task settings with different embodiments. World model methods offer scalability by learning a simulation of the environment but often rely on inefficient gradient-free optimization methods for policy extraction. In contrast, gradient-based methods exhibit lower variance but fail to handle discontinuities. Our work reveals that well-regularized world models can generate smoother optimization landscapes than the actual dynamics, facilitating more effective first-order optimization. We introduce Policy learning with multi-task World Models (PWM), a novel model-based RL algorithm for continuous control. Initially, the world model is pre-trained on offline data, and then policies are extracted from it using first-order optimization in less than 10 minutes per task. PWM effectively solves tasks with up to 152 action dimensions and outperforms methods that use ground-truth dynamics. Additionally,
Authors
(none)
Tags
Stats
Related papers
- Imagine-2-drive: Leveraging High-fidelity World Models Via Multi-modal Diffusion Policies (2024)0.00
- World Models As Reference Trajectories For Rapid Motor Adaptation (2025)0.00
- A Decentralized Policy Gradient Approach To Multi-task Reinforcement Learning (2020)0.00
- Low-variance Policy Gradient Estimation With World Models (2020)0.00
- Enhancing Policy Learning With World-action Model (2026)0.00
- Diwa: Diffusion Policy Adaptation With World Models (2025)0.00
- Do Transformer World Models Give Better Policy Gradients? (2024)0.00
- Policy-driven World Model Adaptation For Robust Offline Model-based Reinforcement Learning (2025)0.00