Provably Efficient Primal-dual Reinforcement Learning For Cmdps With Non-stationary Objectives And Constraints
2022 Β· Yuhao Ding, Javad Lavaei
Abstract
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we identify two alternative conditions on the time-varying constraints under which we can guarantee the safety in the long run. We also propose the \underline\{P\}eriodically \underline\{R\}estarted \underline\{O\}ptimistic \underline\{P\}rimal-\underline\{D\}ual \underline\{P\}roximal \underline\{P\}olicy \u
Authors
(none)
Tags
Stats
Related papers
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Achieving Zero Constraint Violation For Constrained Reinforcement Learning Via Primal-dual Approach (2021)9.59
- A Near-optimal Primal-dual Method For Off-policy Learning In CMDP (2022)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- DOPE: Doubly Optimistic And Pessimistic Exploration For Safe Reinforcement Learning (2021)0.00
- A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps (2024)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- A Two-timescale Primal-dual Framework For Reinforcement Learning Via Online Dual Variable Guidance (2025)0.00