HIPODE: Enhancing Offline Reinforcement Learning With High-quality Synthetic Data From A Policy-decoupled Approach

·2023

arXiv:lian2023hipode ↗Google Scholar ↗Semantic Scholar ↗

Abstract

Offline reinforcement learning (ORL) has gained attention as a means of training reinforcement learning models using pre-collected static data. To address the issue of limited data and improve downstream ORL performance, recent work has attempted to expand the dataset's coverage through data augmentation. However, most of these methods are tied to a specific policy (policy-dependent), where the generated data can only guarantee to support the current downstream ORL policy, limiting its usage scope on other downstream policies. Moreover, the quality of synthetic data is often not well-controlled, which limits the potential for further improving the downstream policy. To tackle these issues, we propose \textbf\{HI\}gh-quality \textbf\{PO\}licy-\textbf\{DE\}coupled~(HIPODE), a novel data augmentation method for ORL. On the one hand, HIPODE generates high-quality synthetic data by selecting states near the dataset distribution with potentially high value among candidate states using the ne

Abstract

Related papers