Offline-to-online Multi-agent Reinforcement Learning With Offline Value Function Memory And Sequential Exploration
2024 Β· Hai Zhong, Xun Wang, Zhuoran Li, et al.
Abstract
Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge gained during offline training, ensurin
Authors
(none)
Tags
Stats
Related papers
- Offline Multi-agent Reinforcement Learning With Implicit Global-to-local Value Regularization (2023)5.84
- A Perspective Of Q-value Estimation On Offline-to-online Reinforcement Learning (2023)7.81
- Offline Multi-agent Reinforcement Learning Via In-sample Sequential Policy Optimization (2024)0.00
- Towards Robust Offline-to-online Reinforcement Learning Via Uncertainty And Smoothness (2023)5.24
- Adaptive Replay Buffer For Offline-to-online Reinforcement Learning (2025)0.00
- Offline Meta Learning Of Exploration (2020)0.00
- Towards Data-driven Offline Simulations For Online Reinforcement Learning (2022)0.00
- SAMG: Offline-to-online Reinforcement Learning Via State-action-conditional Offline Model Guidance (2024)0.00