Improving Context-based Meta-reinforcement Learning With Self-supervised Trajectory Contrastive Learning
2021 Β· Bernie Wang, Simon Xu, Kurt Keutzer, et al.
Abstract
Meta-reinforcement learning typically requires orders of magnitude more samples than single task reinforcement learning methods. This is because meta-training needs to deal with more diverse distributions and train extra components such as context encoders. To address this, we propose a novel self-supervised learning task, which we named Trajectory Contrastive Learning (TCL), to improve meta-training. TCL adopts contrastive learning and trains a context encoder to predict whether two transition windows are sampled from the same trajectory. TCL leverages the natural hierarchical structure of context-based meta-RL and makes minimal assumptions, allowing it to be generally applicable to context-based meta-RL algorithms. It accelerates the training of context encoders and improves meta-training overall. Experiments show that TCL performs better or comparably than a strong meta-RL baseline in most of the environments on both meta-RL MuJoCo (5 of 6) and Meta-World benchmarks (44 out of 50).
Authors
(none)
Tags
Stats
Related papers
- Provably Improved Context-based Offline Meta-rl With Attention And Contrastive Learning (2021)0.00
- On The Effectiveness Of Fine-tuning Versus Meta-reinforcement Learning (2022)0.00
- TACO: Temporal Latent Action-driven Contrastive Loss For Visual Reinforcement Learning (2023)0.00
- Robust Task Representations For Offline Meta-reinforcement Learning Via Contrastive Learning (2022)0.00
- CCLF: A Contrastive-curiosity-driven Learning Framework For Sample-efficient Reinforcement Learning (2022)7.16
- Meta-q-learning (2019)3.58
- Efficient Off-policy Meta-reinforcement Learning Via Probabilistic Context Variables (2019)0.00
- Contrastive Learning As Goal-conditioned Reinforcement Learning (2022)0.00