Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework
2020 Β· Amber Srivastava, Srinivasa M Salapaka
Abstract
We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learn
Authors
(none)
Tags
Stats
Related papers
- Provably Efficient Maximum Entropy Exploration (2018)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Learning Mdps From Features: Predict-then-optimize For Sequential Decision Problems By Reinforcement Learning (2021)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- A Regularized Approach To Sparse Optimal Policy In Reinforcement Learning (2019)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Tsallis Reinforcement Learning: A Unified Framework For Maximum Entropy Reinforcement Learning (2019)0.00
- Efficient Learning For Entropy-regularized Markov Decision Processes Via Multilevel Monte Carlo (2025)0.00