Evolving Pareto-optimal Actor-critic Algorithms For Generalizability And Stability
2022 Β· Juan Jose Garau-Luis, Yingjie Miao, John D. Co-Reyes, et al.
Abstract
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. Designing RL algorithms that optimize these objectives can be a costly and painstaking process. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions. MetaPG explicitly optimizes for generalizability and performance, and implicitly optimizes the stability of both metrics. We initialize our loss function population with Soft Actor-Critic (SAC) and perform multi-objective optimization using fitness metrics encoding single-task performance, zero-shot generalizability to unseen environment configurations, and stability across independent runs with different random seeds. On a set of continuous control tasks from the Real-World RL Benchmark Suite, we find that our method, using a single environment during evolution, evolves algorithms that improve upon SAC's performance and generalizability by 4% and 20%, respectively, and
Authors
(none)
Tags
Stats
Related papers
- A Self-tuning Actor-critic Algorithm (2020)0.00
- Meta Sac-lag: Towards Deployable Safe Reinforcement Learning Via Metagradient-based Hyperparameter Tuning (2024)2.26
- Improving Generalization In Meta Reinforcement Learning Using Learned Objectives (2019)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Context-based Soft Actor Critic For Environments With Non-stationary Dynamics (2021)0.00
- Metatrace Actor-critic: Online Step-size Tuning By Meta-gradient Descent For Reinforcement Learning Control (2018)0.00
- Stackelberg Actor-critic: Game-theoretic Reinforcement Learning Algorithms (2021)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81