Distributional Successor Features Enable Zero-shot Policy Optimization
2024 Β· Chuning Zhu, Xinqi Wang, Tyler Han, et al.
Abstract
Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, zero-shot policy optimization for new tasks with successor features can be challenging. This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs), that learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features achievable within the dataset. By directly modeling long-term outcomes in the da
Authors
(none)
Tags
Stats
Related papers
- Successor Feature Sets: Generalizing Successor Representations Across Policies (2021)5.84
- Proto Successor Measure: Representing The Behavior Space Of An RL Agent (2024)0.00
- Successor Features Combine Elements Of Model-free And Model-based Reinforcement Learning (2019)0.00
- Non-adversarial Inverse Reinforcement Learning Via Successor Feature Matching (2024)0.00
- A Neurally Plausible Model Learns Successor Representations In Partially Observable Environments (2019)0.00
- Successor Features For Transfer In Alternating Markov Games (2025)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Successor Feature Neural Episodic Control (2021)0.00