Challenging Common Assumptions In Convex Reinforcement Learning
2022 Β· Mirco Mutti, Riccardo de Santi, Piersilvio de Bartolomeis, et al.
Abstract
The classic Reinforcement Learning (RL) formulation concerns the maximization of a scalar reward function. More recently, convex RL has been introduced to extend the RL formulation to all the objectives that are convex functions of the state distribution induced by a policy. Notably, convex RL covers several relevant applications that do not fall into the scalar formulation, including imitation learning, risk-averse RL, and pure exploration. In classic RL, it is common to optimize an infinite trials objective, which accounts for the state distribution instead of the empirical state visitation frequencies, even though the actual number of trajectories is always finite in practice. This is theoretically sound since the infinite trials and finite trials objectives can be proved to coincide and thus lead to the same optimal policy. In this paper, we show that this hidden assumption does not hold in the convex RL setting. In particular, we show that erroneously optimizing the infinite trial
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning With Convex Constraints (2019)0.00
- Global Reinforcement Learning: Beyond Linear And Convex Rewards Via Submodular Semi-gradient Methods (2024)0.00
- Breaking The Bias Barrier In Concave Multi-objective Reinforcement Learning (2026)0.00
- Reward Is Enough For Convex Mdps (2021)0.00
- Convex Programs And Lyapunov Functions For Reinforcement Learning: A Unified Perspective On The Analysis Of Value-based Methods (2022)2.26
- Model-agnostic Solutions For Deep Reinforcement Learning In Non-ergodic Contexts (2026)0.00
- Optimism As Risk-seeking In Multi-agent Reinforcement Learning (2025)0.00
- Examining Average And Discounted Reward Optimality Criteria In Reinforcement Learning (2021)0.00