Efficient Algorithms For Mitigating Uncertainty And Risk In Reinforcement Learning
2025 Β· Xihong Su
Abstract
This dissertation makes three main contributions. First, We identify a new connection between policy gradient and dynamic programming in MMDPs and propose the Coordinate Ascent Dynamic Programming (CADP) algorithm to compute a Markov policy that maximizes the discounted return averaged over the uncertain models. CADP adjusts model weights iteratively to guarantee monotone policy improvements to a local maximum. Second, We establish sufficient and necessary conditions for the exponential ERM Bellman operator to be a contraction and prove the existence of stationary deterministic optimal policies for ERM-TRC and EVaR-TRC. We also propose exponential value iteration, policy iteration, and linear programming algorithms for computing optimal stationary policies for ERM-TRC and EVaR-TRC. Third, We propose model-free Q-learning algorithms for computing policies with risk-averse objectives: ERM-TRC and EVaR-TRC. The challenge is that Q-learning ERM Bellman may not be a contraction. Instead, we
Authors
(none)
Tags
Stats
Related papers
- Planning And Learning In Average Risk-aware Mdps (2025)0.00
- Lyapunov Robust Constrained-mdps: Soft-constrained Robustly Stable Policy Optimization Under Model Uncertainty (2021)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Robust Bayesian Dynamic Programming For On-policy Risk-sensitive Reinforcement Learning (2025)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, And Separation Design (2022)3.58
- Bayesian Risk-sensitive Policy Optimization For Mdps With General Loss Functions (2025)0.00