Functional Critics Are Essential For Actor-critic: From Off-policy Stability To Efficient Exploration
2025 Β· Qinxun Bai, Yuxuan Han, Wei Xu, et al.
Abstract
The actor-critic (AC) framework has achieved strong empirical success in off-policy reinforcement learning but suffers from the "moving target" problem, where the evaluated policy changes continually. Functional critics, or policy-conditioned value functions, address this by explicitly including a representation of the policy as input. While conceptually appealing, previous efforts have struggled to remain competitive against standard AC. In this work, we revisit functional critics within the actor-critic framework and identify two critical aspects that render them a necessity rather than a luxury. First, we demonstrate their power in stabilizing the complex interplay between the "deadly triad" and the "moving target". We provide a convergent off-policy AC algorithm under linear functional approximation that dismantles several longstanding barriers between theory and practice: it utilizes target-based TD learning, accommodates dynamic behavior policies, and operates without the restric
Authors
(none)
Tags
Stats
Related papers
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00
- Non-asymptotic Analysis For Single-loop (natural) Actor-critic With Compatible Function Approximation (2024)0.00
- Improving Actor-critic Training With Steerable Action-value Approximation Errors (2024)0.00
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00
- Boosting Exploration In Actor-critic Algorithms By Incentivizing Plausible Novel States (2022)5.24
- Ader:adapting Between Exploration And Robustness For Actor-critic Methods (2021)0.00
- Analysis Of A Target-based Actor-critic Algorithm With Linear Function Approximation (2021)0.00
- Wasserstein Flow Meets Replicator Dynamics: A Mean-field Analysis Of Representation Learning In Actor-critic (2021)0.00