Global Reinforcement Learning: Beyond Linear And Convex Rewards Via Submodular Semi-gradient Methods
2024 Β· Riccardo de Santi, Manish Prajapat, Andreas Krause
Abstract
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce Global RL (GRL), where rewards are globally defined over trajectories instead of locally over states. Global rewards can capture negative interactions among states, e.g., in exploration, via submodularity, positive interactions, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any GRL problem to a sequence of classic RL problems and solves it efficiently with curvature-dependent app
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning With Convex Constraints (2019)0.00
- Challenging Common Assumptions In Convex Reinforcement Learning (2022)0.00
- Variational Policy Gradient Method For Reinforcement Learning With General Utilities (2020)0.00
- On The Global Optimality Of Policy Gradient Methods In General Utility Reinforcement Learning (2024)0.00
- Breaking The Bias Barrier In Concave Multi-objective Reinforcement Learning (2026)0.00
- Policy Gradient For Reinforcement Learning With General Utilities (2022)0.00
- Computational Benefits Of Intermediate Rewards For Goal-reaching Policy Learning (2021)0.00
- Exploration-exploitation Trade-off In Reinforcement Learning On Online Markov Decision Processes With Global Concave Rewards (2019)0.00