Self-confirming Transformer For Belief-conditioned Adaptation In Offline Multi-agent Reinforcement Learning
2023 Β· Tao Li, Juan Guevara, Xinhong Xie, et al.
Abstract
Offline reinforcement learning (RL) suffers from the distribution shift between the offline dataset and the online environment. In multi-agent RL (MARL), this distribution shift may arise from the nonstationary opponents in the online testing who display distinct behaviors from those recorded in the offline dataset. Hence, the key to the broader deployment of offline MARL is the online adaptation to nonstationary opponents. Recent advances in foundation models, e.g., large language models, have demonstrated the generalization ability of the transformer, an emerging neural network architecture, in sequence modeling, of which offline RL is a special case. One naturally wonders \textit\{whether offline-trained transformer-based RL policies adapt to nonstationary opponents online\}. We propose a novel auto-regressive training to equip transformer agents with online adaptability based on the idea of self-augmented pre-conditioning. The transformer agent first learns offline to predict the o
Authors
(none)
Tags
Stats
Related papers
- Belief-based Offline Reinforcement Learning For Delay-robust Policy Optimization (2025)0.00
- Offline Pre-trained Multi-agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks (2021)0.00
- Double Check My Desired Return: Transformer With Target Alignment For Offline Reinforcement Learning (2025)0.00
- Offline Meta-reinforcement Learning With Online Self-supervision (2021)0.00
- Decision Mamba: A Multi-grained State Space Model With Self-evolution Regularization For Offline RL (2024)0.00
- Solving Continual Offline Reinforcement Learning With Decision Transformer (2024)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Comadice: Offline Cooperative Multi-agent Reinforcement Learning With Stationary Distribution Shift Regularization (2024)0.00