Beyond Expected Return: Accounting For Policy Reproducibility When Evaluating Reinforcement Learning Algorithms
2023 Β· Manon Flageat, Bryan Lim, Antoine Cully
Abstract
Many applications in Reinforcement Learning (RL) usually have noise or stochasticity present in the environment. Beyond their impact on learning, these uncertainties lead the exact same policy to perform differently, i.e. yield different return, from one roll-out to another. Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution. Our work defines this spread as the policy reproducibility: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications. We highlight that existing procedures that only use the expected return are limited on two fronts: first an infinite number of return distributions with a wide range of performance-reproducibility trade-offs can have the same expected return, limiting its effectiveness when used for comparing policies; second, the expected return metric does not leave an
Authors
(none)
Tags
Stats
Related papers
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Replicability In Reinforcement Learning (2023)0.00
- Assessing The Impact Of Distribution Shift On Reinforcement Learning Performance (2024)0.00
- Expected Return Causes Outcome-level Mode Collapse In Reinforcement Learning And How To Fix It With Inverse Probability Scaling (2026)0.00
- Let's Play Again: Variability Of Deep Reinforcement Learning Agents In Atari Environments (2019)0.00
- Measuring The Reliability Of Reinforcement Learning Algorithms (2019)4.43
- Gap-increasing Policy Evaluation For Efficient And Noise-tolerant Reinforcement Learning (2019)0.00
- Performance Bounds For Policy-based Average Reward Reinforcement Learning Algorithms (2023)2.26