Burning RED: Unlocking Subtask-driven Reinforcement Learning And Risk-awareness In Average-reward Markov Decision Processes
2024 Β· Juan Sebastian Rojas, Chi-Guhn Lee
Abstract
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. However, average-reward MDPs have remained largely unexplored in reinforcement learning (RL) settings, with the majority of RL-based efforts having been allocated to discounted MDPs. In this work, we study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various learning objectives, or subtasks, simultaneously in the average-reward setting. We introduce a family of RED learning algorithms for prediction and control, including proven-convergent algorithms for the tabular case. We then showcase the power of these algorithms by demonstrating how they can be used to learn a policy that optimizes, for the first time, the well-known conditional value-at-risk (CVaR) risk measure in a fully-online ma
Authors
(none)
Tags
Stats
Related papers
- Sharper Model-free Reinforcement Learning For Average-reward Markov Decision Processes (2023)0.00
- Robust Risk-sensitive Reinforcement Learning With Conditional Value-at-risk (2024)5.84
- RUDDER: Return Decomposition For Delayed Rewards (2018)0.00
- Optimizing The Long-term Average Reward For Continuing Mdps: A Technical Report (2021)0.00
- Variance-aware Regret Bounds For Undiscounted Reinforcement Learning In Mdps (2018)0.00
- Planning And Learning In Average Risk-aware Mdps (2025)0.00
- Learning And Planning In Average-reward Markov Decision Processes (2020)0.00
- Achieving Fairness In Multi-agent Markov Decision Processes Using Reinforcement Learning (2023)0.00