Model-free Reinforcement Learning For Model-based Control: Towards Safe, Interpretable And Sample-efficient Agents
2025 Β· Thomas Banker, Ali Mesbah
Abstract
Training sophisticated agents for optimal decision-making under uncertainty has been key to the rapid development of modern autonomous systems across fields. Notably, model-free reinforcement learning (RL) has enabled decision-making agents to improve their performance directly through system interactions, with minimal prior knowledge about the system. Yet, model-free RL has generally relied on agents equipped with deep neural network function approximators, appealing to the networks' expressivity to capture the agent's policy and value function for complex systems. However, neural networks amplify the issues of sample inefficiency, unsafe learning, and limited interpretability in model-free RL. To this end, this work introduces model-based agents as a compelling alternative for control policy approximation, leveraging adaptable models of system dynamics, cost, and constraints for safe policy learning. These models can encode prior system knowledge to inform, constrain, and aid in expl
Authors
(none)
Tags
Stats
Related papers
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Improving Sample Efficiency In Model-free Reinforcement Learning From Images (2019)16.99
- Safety Correction From Baseline: Towards The Risk-aware Policy In Robotics Via Dual-agent Reinforcement Learning (2022)3.58
- Conservative And Adaptive Penalty For Model-based Safe Reinforcement Learning (2021)0.00
- Hierarchical Framework For Interpretable And Probabilistic Model-based Safe Reinforcement Learning (2023)0.00
- Model-based Adaptation For Sample Efficient Transfer In Reinforcement Learning Control Of Parameter-varying Systems (2023)2.26