Evaluation-aware Reinforcement Learning
2025 Β· Shripad Vilasrao Deshmukh, Will Schwarzer, Scott Niekum
Abstract
Policy evaluation is a core component of many reinforcement learning (RL) algorithms and a critical tool for ensuring safe deployment of RL policies. However, existing policy evaluation methods often suffer from high variance or bias. To address these issues, we introduce Evaluation-Aware Reinforcement Learning (EvA-RL), a general policy learning framework that considers evaluation accuracy at train-time, as opposed to standard post-hoc policy evaluation methods. Specifically, EvA-RL directly optimizes policies for efficient and accurate evaluation, in addition to being performant. We provide an instantiation of EvA-RL and demonstrate through a combination of theoretical analysis and empirical results that EvA-RL effectively trades off between evaluation accuracy and expected return. Finally, we show that the evaluation-aware policy and the evaluation mechanism itself can be co-learned to mitigate this tradeoff, providing the evaluation benefits without significantly sacrificing policy
Authors
(none)
Tags
Stats
Related papers
- Efficient Policy Evaluation With Safety Constraint For Reinforcement Learning (2024)0.00
- Doubly Optimal Policy Evaluation For Reinforcement Learning (2024)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- General Policy Evaluation And Improvement By Learning To Identify Few But Crucial States (2022)0.00
- Erl-re\(^2\): Efficient Evolutionary Reinforcement Learning With Shared State Representation And Individual Policy Representation (2022)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Robust On-policy Sampling For Data-efficient Policy Evaluation In Reinforcement Learning (2021)0.00
- Gap-increasing Policy Evaluation For Efficient And Noise-tolerant Reinforcement Learning (2019)0.00