Sample-efficient Multi-objective Learning Via Generalized Policy Improvement Prioritization
2023 Β· Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, et al.
Abstract
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences over (possibly conflicting) reward functions. Such algorithms often learn a set of policies (each optimized for a particular agent preference) that can later be used to solve problems with novel preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes that improve sample-efficient learning. They implement active-learning strategies by which the agent can (i) identify the most promising preferences/objectives to train on at each moment, to more rapidly solve a given MORL problem; and (ii) identify which previous experiences are most relevant when learning a policy for a particular agent preference, via a novel Dyna-style MORL method. We prove our algorithm is guaranteed to always converge to an optimal solution in a finite number of steps, or an \(\epsilon\)
Authors
(none)
Tags
Stats
Related papers
- A Generalized Algorithm For Multi-objective Reinforcement Learning And Policy Adaptation (2019)0.00
- Provable Multi-objective Reinforcement Learning With Generative Models (2020)0.00
- Toward Negotiable Reinforcement Learning: Shifting Priorities In Pareto Optimal Sequential Decision-making (2017)0.00
- Interpretability By Design For Efficient Multi-objective Reinforcement Learning (2025)0.00
- On Generalization Across Environments In Multi-objective Reinforcement Learning (2025)0.00
- LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning With Subproblem Exploration (2025)0.00
- Navigating Trade-offs: Policy Summarization For Multi-objective Reinforcement Learning (2024)2.26
- Addressing The Issue Of Stochastic Environments And Local Decision-making In Multi-objective Reinforcement Learning (2022)0.00