Harmodt: Harmony Multi-task Decision Transformer For Offline Reinforcement Learning
2024 Β· Shengchao Hu, Ziqing Fan, Li Shen, et al.
Abstract
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and complexity pose significant challenges in policy formulation, necessitating judicious parameter sharing and management of conflicting gradients for optimal policy performance. In this work, we introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task. We approach this as a bi-level optimization problem, employing a meta-learning framework that leverages gradient-based techniques. The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspac
Authors
(none)
Tags
Stats
Related papers
- A Decentralized Policy Gradient Approach To Multi-task Reinforcement Learning (2020)0.00
- Offline Pre-trained Multi-agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks (2021)0.00
- Decision Mamba: A Multi-grained State Space Model With Self-evolution Regularization For Offline RL (2024)0.00
- Solving Continual Offline Reinforcement Learning With Decision Transformer (2024)0.00
- Provable Multi-task Reinforcement Learning: A Representation Learning Framework With Low Rank Rewards (2026)0.00
- Generalized Decision Transformer For Offline Hindsight Information Matching (2021)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL (2022)0.00