Differentiable Evolutionary Reinforcement Learning
2025 Β· Sitao Cheng, Tianle Li, Xuhan Huang, et al.
Abstract
The design of effective reward functions presents a central and often arduous challenge in reinforcement learning (RL), particularly when developing autonomous agents for complex reasoning tasks. While automated reward optimization approaches exist, they typically rely on derivative-free evolutionary heuristics that treat the reward function as a black box, failing to capture the causal relationship between reward structure and task performance. To bridge this gap, we propose Differentiable Evolutionary Reinforcement Learning (DERL), a bilevel framework that enables the autonomous discovery of optimal reward signals. In DERL, a Meta-Optimizer evolves a reward function (i.e., Meta-Reward) by composing structured atomic primitives, guiding the training of an inner-loop policy. Crucially, unlike previous evolution, DERL is differentiable in its metaoptimization: it treats the inner-loop validation performance as a signal to update the Meta-Optimizer via reinforcement learning. This allows
Authors
(none)
Tags
Stats
Related papers
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Bierl: A Meta Evolutionary Reinforcement Learning Framework Via Bilevel Optimization (2023)2.26
- Evolutionary Reinforcement Learning: A Survey (2023)13.93
- Evolution-guided Policy Gradient In Reinforcement Learning (2018)0.00
- Derivative-free Reinforcement Learning: A Review (2021)11.85
- Illuminating The Three Dogmas Of Reinforcement Learning Under Evolutionary Light (2025)0.00
- Deep Reinforcement Learning From Hierarchical Preference Design (2023)2.00
- Collaborative Evolutionary Reinforcement Learning (2019)0.00