Pessimistic Value Iteration For Multi-task Data Sharing In Offline Reinforcement Learning
2024 Β· Chenjia Bai, Lingxiao Wang, Jianye Hao, et al.
Abstract
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolv
Authors
(none)
Tags
Stats
Related papers
- Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning (2024)3.58
- Is Pessimism Provably Efficient For Offline RL? (2020)0.00
- Pessimistic Bootstrapping For Uncertainty-driven Offline Reinforcement Learning (2022)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- Why So Pessimistic? Estimating Uncertainties For Offline RL Through Ensembles, And Why Their Independence Matters (2022)6.77
- POPO: Pessimistic Offline Policy Optimization (2020)5.24
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- Double Pessimism Is Provably Efficient For Distributionally Robust Offline Reinforcement Learning: Generic Algorithm And Robust Partial Coverage (2023)0.00