TD-MPC2: Scalable, Robust World Models For Continuous Control
2023 Β· Nicklas Hansen, Hao Su, Xiaolong Wang
Abstract
TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://tdmpc2.com
Authors
(none)
Tags
Stats
Related papers
- Model Predictive Control With Self-supervised Representation Learning (2023)0.00
- Model Predictive Control And Reinforcement Learning: A Unified Framework Based On Dynamic Programming (2024)10.61
- PWM: Policy Learning With Multi-task World Models (2024)0.00
- M\(^3\)PC: Test-time Model Predictive Control For Pretrained Masked Trajectory Model (2024)0.00
- Driving Reinforcement Learning With Models (2019)0.00
- Deepsafempc: Deep Learning-based Model Predictive Control For Safe Multi-agent Reinforcement Learning (2024)0.00
- Deepmdp: Learning Continuous Latent Space Models For Representation Learning (2019)0.00
- Towards An Adaptable And Generalizable Optimization Engine In Decision And Control: A Meta Reinforcement Learning Approach (2024)0.00