An Empirical Investigation Of Value-based Multi-objective Reinforcement Learning For Stochastic Environments
2024 Β· Kewen Ding, Peter Vamplew, Cameron Foale, et al.
Abstract
One common approach to solve multi-objective reinforcement learning (MORL) problems is to extend conventional Q-learning by using vector Q-values in combination with a utility function. However issues can arise with this approach in the context of stochastic environments, particularly when optimising for the Scalarised Expected Reward (SER) criterion. This paper extends prior research, providing a detailed examination of the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy for an environment with stochastic state transitions. We empirically examine several variations of the core multi-objective Q-learning algorithm as well as reward engineering approaches, and demonstrate the limitations of these methods. In particular, we highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.
Authors
(none)
Tags
Stats
Related papers
- Addressing The Issue Of Stochastic Environments And Local Decision-making In Multi-objective Reinforcement Learning (2022)0.00
- Issues With Value-based Multi-objective Reinforcement Learning: Value Function Interference And Overestimation Sensitivity (2024)0.00
- Provable Multi-objective Reinforcement Learning With Generative Models (2020)0.00
- On Generalization Across Environments In Multi-objective Reinforcement Learning (2025)0.00
- Utility-based Reinforcement Learning: Unifying Single-objective And Multi-objective Reinforcement Learning (2024)2.26
- Limitations Of Scalarisation In MORL: A Comparative Study In Discrete Environments (2025)0.00
- Relationship Explainable Multi-objective Optimization Via Vector Value Function Based Reinforcement Learning (2019)0.00
- A Generalized Algorithm For Multi-objective Reinforcement Learning And Policy Adaptation (2019)0.00