Extrapolation In Gridworld Markov-decision Processes
2020 Β· Eugene Charniak
Abstract
Extrapolation in reinforcement learning is the ability to generalize at test time given states that could never have occurred at training time. Here we consider four factors that lead to improved extrapolation in a simple Gridworld environment: (a) avoiding maximum Q-value (or other deterministic methods) for action choice at test time, (b) ego-centric representation of the Gridworld, (c) building rotational and mirror symmetry into the learning mechanism using rotational and mirror invariant convolution (rather than standard translation-invariant convolution), and (d) adding a maximum entropy term to the loss function to encourage equally good actions to be chosen equally often.
Authors
(none)
Tags
Stats
Related papers
- Regular Decision Processes For Grid Worlds (2021)0.00
- Extra: Transfer-guided Exploration (2019)0.00
- Learning What To Do By Simulating The Past (2021)0.00
- Learning Of Generalizable And Interpretable Knowledge In Grid-based Reinforcement Learning Environments (2023)3.58
- Programmatic Reinforcement Learning: Navigating Gridworlds (2024)0.00
- Sample-efficient Reinforcement Learning In The Presence Of Exogenous Information (2022)0.00
- Budgeting Counterfactual For Offline RL (2023)0.00
- Forward-backward Reinforcement Learning (2018)0.00