Human-readable Programs As Actors Of Reinforcement Learning Agents Using Critic-moderated Evolution
2024 Β· Senne Deproost, Denis Steckelmacher, Ann NowΓ©
Abstract
With Deep Reinforcement Learning (DRL) being increasingly considered for the control of real-world systems, the lack of transparency of the neural network at the core of RL becomes a concern. Programmatic Reinforcement Learning (PRL) is able to to create representations of this black-box in the form of source code, not only increasing the explainability of the controller but also allowing for user adaptations. However, these methods focus on distilling a black-box policy into a program and do so after learning using the Mean Squared Error between produced and wanted behaviour, discarding other elements of the RL algorithm. The distilled policy may therefore perform significantly worse than the black-box learned policy. In this paper, we propose to directly learn a program as the policy of an RL agent. We build on TD3 and use its critics as the basis of the objective function of a genetic algorithm that syntheses the program. Our approach builds the program during training, as opposed
Authors
(none)
Tags
Stats
Related papers
- Evolution-guided Policy Gradient In Reinforcement Learning (2018)0.00
- Multimodal Llm-assisted Evolutionary Search For Programmatic Control Policies (2025)0.00
- Efficient Exploration In Deep Reinforcement Learning: A Novel Bayesian Actor-critic Algorithm (2024)0.00
- "so, Tell Me About Your Policy...": Distillation Of Interpretable Policies From Deep Reinforcement Learning Agents (2025)0.00
- Survival Dynamics Of Neural And Programmatic Policies In Evolutionary Reinforcement Learning (2026)0.00
- Collaborative Evolutionary Reinforcement Learning (2019)0.00
- Policyevolve: Evolving Programmatic Policies By Llms For Multi-player Games Via Population-based Training (2025)0.00
- Programmatic Reinforcement Learning: Navigating Gridworlds (2024)0.00