The Loca Regret: A Consistent Metric To Evaluate Model-based Behavior In Reinforcement Learning
2020 Β· Harm van Seijen, Hadi Nekoei, Evan Racah, et al.
Abstract
Deep model-based Reinforcement Learning (RL) has the potential to substantially improve the sample-efficiency of deep RL. While various challenges have long held it back, a number of papers have recently come out reporting success with deep model-based methods. This is a great development, but the lack of a consistent metric to evaluate such methods makes it difficult to compare various approaches. For example, the common single-task sample-efficiency metric conflates improvements due to model-based learning with various other aspects, such as representation learning, making it difficult to assess true progress on model-based RL. To address this, we introduce an experimental setup to evaluate model-based behavior of RL methods, inspired by work from neuroscience on detecting model-based behavior in humans and animals. Our metric based on this setup, the Local Change Adaptation (LoCA) regret, measures how quickly an RL method adapts to a local change in the environment. Our metric can i
Authors
(none)
Tags
Stats
Related papers
- Towards Evaluating Adaptivity Of Model-based Reinforcement Learning Methods (2022)0.00
- Replay Buffer With Local Forgetting For Adapting To Local Environment Changes In Deep Model-based Reinforcement Learning (2023)0.00
- Chirps: Change-induced Regret Proxy Metrics For Lifelong Reinforcement Learning (2024)0.00
- Algorithmic Framework For Model-based Deep Reinforcement Learning With Theoretical Guarantees (2018)0.00
- Partial Models For Building Adaptive Model-based Reinforcement Learning Agents (2024)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- Online Reinforcement Learning In Non-stationary Context-driven Environments (2023)0.00
- Understanding Behavioral Metric Learning: A Large-scale Study On Distracting Reinforcement Learning Environments (2025)0.00