Model-based Offline Reinforcement Learning With Adversarial Data Augmentation
2025 Β· Hongye Cao, Fan Feng, Jing Huo, et al.
Abstract
Model-based offline Reinforcement Learning (RL) constructs environment models from offline datasets to perform conservative policy optimization. Existing approaches focus on learning state transitions through ensemble models, rollouting conservative estimation to mitigate extrapolation errors. However, the static data makes it challenging to develop a robust policy, and offline agents cannot access the environment to gather new data. To address these challenges, we introduce Model-based Offline Reinforcement learning with AdversariaL data augmentation (MORAL). In MORAL, we replace the fixed horizon rollout by employing adversaria data augmentation to execute alternating sampling with ensemble models to enrich training data. Specifically, this adversarial process dynamically selects ensemble models against policy for biased sampling, mitigating the optimistic estimation of fixed models, thus robustly expanding the training data for policy optimization. Moreover, a differential factor is
Authors
(none)
Tags
Stats
Related papers
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- DOMAIN: Mildly Conservative Model-based Offline Reinforcement Learning (2023)0.00
- Overcoming Model Bias For Robust Offline Deep Reinforcement Learning (2020)11.58
- Deployment-efficient Reinforcement Learning Via Model-based Offline Optimization (2020)0.00
- Policy-driven World Model Adaptation For Robust Offline Model-based Reinforcement Learning (2025)0.00
- Robust Model-based Reinforcement Learning With An Adversarial Auxiliary Model (2024)0.00
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00
- Towards Robust Policy: Enhancing Offline Reinforcement Learning With Adversarial Attacks And Defenses (2024)3.58