Issues With Value-based Multi-objective Reinforcement Learning: Value Function Interference And Overestimation Sensitivity
2024 Β· Peter Vamplew, Ethan, Watkins, et al.
Abstract
Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL) to the more general case of problems with multiple, conflicting objectives, represented by vector-valued rewards. Widely-used scalar RL methods such as Q-learning can be modified to handle multiple objectives by (1) learning vector-valued value functions, and (2) performing action selection using a scalarisation or ordering operator which reflects the user's preferences with respect to the different objectives. This paper investigates two previously unreported issues which can hinder the performance of value-based MORL algorithms when applied in conjunction with a non-linear utility function -- value function interference, and sensitivity to overestimation. We illustrate the nature of these phenomena on simple multi-objective MDPs using a tabular implementation of multiobjective Q-learning.
Authors
(none)
Tags
Stats
Related papers
- Addressing The Issue Of Stochastic Environments And Local Decision-making In Multi-objective Reinforcement Learning (2022)0.00
- An Empirical Investigation Of Value-based Multi-objective Reinforcement Learning For Stochastic Environments (2024)0.00
- Provable Multi-objective Reinforcement Learning With Generative Models (2020)0.00
- Utility-based Reinforcement Learning: Unifying Single-objective And Multi-objective Reinforcement Learning (2024)2.26
- On Generalization Across Environments In Multi-objective Reinforcement Learning (2025)0.00
- Using Logical Specifications Of Objectives In Multi-objective Reinforcement Learning (2019)0.00
- Interpretability By Design For Efficient Multi-objective Reinforcement Learning (2025)0.00
- Relationship Explainable Multi-objective Optimization Via Vector Value Function Based Reinforcement Learning (2019)0.00