LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning With Subproblem Exploration
2025 Β· Ruiyu Qiu, Rui Wang, Guanghui Yang, et al.
Abstract
Lexicographic multi-objective problems, which consist of multiple conflicting subtasks with explicit priorities, are common in real-world applications. Despite the advantages of Reinforcement Learning (RL) in single tasks, extending conventional RL methods to prioritized multiple objectives remains challenging. In particular, traditional Safe RL and Multi-Objective RL (MORL) methods have difficulty enforcing priority orderings efficiently. Therefore, Lexicographic Multi-Objective RL (LMORL) methods have been developed to address these challenges. However, existing LMORL methods either rely on heuristic threshold tuning with prior knowledge or are restricted to discrete domains. To overcome these limitations, we propose Lexicographically Projected Policy Gradient RL (LPPG-RL), a novel LMORL framework which leverages sequential gradient projections to identify feasible policy update directions, thereby enabling LPPG-RL broadly compatible with all policy gradient algorithms in continuous
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Multi-objective Learning Via Generalized Policy Improvement Prioritization (2023)5.24
- A Generalized Algorithm For Multi-objective Reinforcement Learning And Policy Adaptation (2019)0.00
- Interpretability By Design For Efficient Multi-objective Reinforcement Learning (2025)0.00
- GHPO: Adaptive Guidance For Stable And Efficient LLM Reinforcement Learning (2025)0.00
- Turn-ppo: Turn-level Advantage Estimation With PPO For Improved Multi-turn RL In Agentic Llms (2025)0.00
- Residual Policy Gradient: A Reward View Of Kl-regularized Objective (2025)0.00
- Think Outside The Policy: In-context Steered Policy Optimization (2025)0.00
- Natural Policy Gradient And Actor Critic Methods For Constrained Multi-task Reinforcement Learning (2024)0.00