Evolving Pareto-optimal Actor-critic Algorithms For Generalizability And Stability

Abstract

Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. Designing RL algorithms that optimize these objectives can be a costly and painstaking process. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions. MetaPG explicitly optimizes for generalizability and performance, and implicitly optimizes the stability of both metrics. We initialize our loss function population with Soft Actor-Critic (SAC) and perform multi-objective optimization using fitness metrics encoding single-task performance, zero-shot generalizability to unseen environment configurations, and stability across independent runs with different random seeds. On a set of continuous control tasks from the Real-World RL Benchmark Suite, we find that our method, using a single environment during evolution, evolves algorithms that improve upon SAC's performance and generalizability by 4% and 20%, respectively, and

Evolving Pareto-optimal Actor-critic Algorithms For Generalizability And Stability

Abstract

Authors

Tags

Stats

Related papers