Unbiased Asymmetric Reinforcement Learning Under Partial Observability
2021 Β· Andrea Baisero, Christopher Amato
Abstract
In partially observable reinforcement learning, offline training gives access to latent information which is not available during online training and/or execution, such as the system state. Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic. However, many asymmetric methods lack theoretical foundation, and are only evaluated on limited domains. We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant, and limit its ability to address partial observability. We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound, maintaining the validity of the policy gradient theorem, and introducing no bias and relatively low variance into the training process. An empirical evaluation performed on domains which exhibit significant partial observability
Authors
(none)
Tags
Stats
Related papers
- Informed Asymmetric Actor-critic: Leveraging Privileged Signals Beyond Full-state Access (2025)0.00
- Multi-agent Off-policy Actor-critic Reinforcement Learning For Partially Observable Environments (2024)2.26
- On Overfitting And Asymptotic Bias In Batch Reinforcement Learning With Partial Observability (2017)9.23
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00
- Provable Partially Observable Reinforcement Learning With Privileged Information (2024)2.26
- Benchmarking Partial Observability In Reinforcement Learning With A Suite Of Memory-improvable Domains (2025)0.00
- Belief States For Cooperative Multi-agent Reinforcement Learning Under Partial Observability (2025)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00