Neuro-algorithmic Policies Enable Fast Combinatorial Generalization
2021 · Marin Vlastelica, Michal Rolínek, Georg Martius
Abstract
Although model-based and model-free approaches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to task variations is still lacking. Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We give evidence that generalization capabilities are in many cases bottlenecked by the inability to generalize on the combinatorial aspects of the problem. Furthermore, we show that for a certain subclass of the MDP framework, this can be alleviated by neuro-algorithmic architectures. Many control problems require long-term planning that is hard to solve generically with neural networks alone. We introduce a neuro-algorithmic policy architecture consisting of a neural network and an embedded time-dependent shortest path solver. These policies can be trained end-to-end by blackbox differentiation. We show that this type of architecture generalizes well to unseen variatio
Authors
(none)
Tags
Stats
Related papers
- Generalized Policy Improvement Algorithms With Theoretically Supported Sample Reuse (2022)5.24
- Synergizing Reinforcement Learning And Genetic Algorithms For Neural Combinatorial Optimization (2025)0.00
- Neupl: Neural Population Learning (2022)0.00
- On Learning History Based Policies For Controlling Markov Decision Processes (2022)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- On The Expressivity Of Neural Networks For Deep Reinforcement Learning (2019)0.00
- Pac-bayesian Reinforcement Learning Trains Generalizable Policies (2025)0.00
- RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm For Continuous Control Of Nonlinear Dynamical Systems (2019)0.00