Off-policy Distributional Q(\(\lambda\)): Distributional RL Without Importance Sampling
2024 Β· Yunhao Tang, Mark Rowland, RΓ©mi Munos, et al.
Abstract
We introduce off-policy distributional Q(\(\lambda\)), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q(\(\lambda\)) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q(\(\lambda\)) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q(\(\lambda\)) and validate theoretical insights with tabular experiments. We show how distributional Q(\(\lambda\))-C51, a combination of Q(\(\lambda\)) with the C51 agent, exhibits promising results on deep RL benchmarks.
Authors
(none)
Tags
Stats
Related papers
- Distributional Reinforcement Learning With Quantile Regression (2017)19.20
- Fully Parameterized Quantile Function For Distributional Reinforcement Learning (2019)0.00
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77
- IDQL: Implicit Q-learning As An Actor-critic Method With Diffusion Policies (2023)0.00
- Q-prop: Sample-efficient Policy Gradient With An Off-policy Critic (2016)0.00
- Continuous Control Reinforcement Learning: Distributed Distributional Drq Algorithms (2024)0.00
- A Distributional Analysis Of Sampling-based Reinforcement Learning Algorithms (2020)0.00
- Minimax Weight And Q-function Learning For Off-policy Evaluation (2019)0.00