Learning From Good Trajectories In Offline Multi-agent Reinforcement Learning
2022 Β· Qi Tian, Kun Kuang, Furui Liu, et al.
Abstract
Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline m
Authors
(none)
Tags
Stats
Related papers
- Offline Multi-agent Reinforcement Learning Via In-sample Sequential Policy Optimization (2024)0.00
- Value-guidance Meanflow For Offline Multi-agent Reinforcement Learning (2026)0.00
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77
- Comadice: Offline Cooperative Multi-agent Reinforcement Learning With Stationary Distribution Shift Regularization (2024)0.00
- Hierarchical Deep Multiagent Reinforcement Learning With Temporal Abstraction (2018)0.00
- Learning To Share In Multi-agent Reinforcement Learning (2021)0.00
- Dealing With Non-stationarity In Decentralized Cooperative Multi-agent Deep Reinforcement Learning Via Multi-timescale Learning (2023)0.00
- Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting (2023)0.00