Multi-objective Reinforcement Learning With Nonlinear Preferences: Provable Approximation For Maximizing Expected Scalarized Return
2023 Β· Nianli Peng, Muhang Tian, Brandon Fain
Abstract
We study multi-objective reinforcement learning with nonlinear preferences over trajectories. That is, we maximize the expected value of a nonlinear function over accumulated rewards (expected scalarized return or ESR) in a multi-objective Markov Decision Process (MOMDP). We derive an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. Using this formulation, we describe an approximation algorithm for computing an approximately optimal non-stationary policy in pseudopolynomial time for smooth scalarization functions with a constant number of rewards. We prove the approximation analytically and demonstrate the algorithm experimentally, showing that there can be a substantial gap between the optimal policy computed by our algorithm and alternative baselines.
Authors
(none)
Tags
Stats
Related papers
- Breaking The Bias Barrier In Concave Multi-objective Reinforcement Learning (2026)0.00
- A Generalized Algorithm For Multi-objective Reinforcement Learning And Policy Adaptation (2019)0.00
- SBEED: Convergent Reinforcement Learning With Nonlinear Function Approximation (2017)0.00
- Multi-objective Reward And Preference Optimization: Theory And Algorithms (2025)0.00
- Strategically Robust Multi-agent Reinforcement Learning With Linear Function Approximation (2026)0.00
- Accommodating Picky Customers: Regret Bound And Exploration Complexity For Multi-objective Reinforcement Learning (2020)0.00
- Addressing The Issue Of Stochastic Environments And Local Decision-making In Multi-objective Reinforcement Learning (2022)0.00
- Joint Optimization Of Multi-objective Reinforcement Learning With Policy Gradient Based Algorithm (2021)6.34