Dichotomy Of Control: Separating What You Can Control From What You Cannot
2022 Β· Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, et al.
Abstract
Future- or return-conditioned supervised learning is an emerging paradigm for offline reinforcement learning (RL), where the future outcome (i.e., return) associated with an observed action sequence is used as input to a policy trained to imitate those same actions. While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves. Such situations can lead to a learned policy that is inconsistent with its conditioning inputs; i.e., using the policy to act in the environment, when conditioning on a specific desired return, leads to a distribution of real returns that is wildly different than desired. In this work, we propose the dichotomy of control (DoC), a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from t
Authors
(none)
Tags
Stats
Related papers
- When Does Return-conditioned Supervised Learning Work For Offline Reinforcement Learning? (2022)0.00
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- Online Reinforcement Learning In Non-stationary Context-driven Environments (2023)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Return-aligned Decision Transformer (2024)1.69
- Contrastive Diffuser: Planning Towards High Return States Via Contrastive Learning (2024)0.00
- Upside-down Reinforcement Learning For More Interpretable Optimal Control (2024)0.00
- Double Check My Desired Return: Transformer With Target Alignment For Offline Reinforcement Learning (2025)0.00