On Learning History Based Policies For Controlling Markov Decision Processes
2022 Β· Gandharv Patil, Aditya Mahajan, Doina Precup
Abstract
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.
Authors
(none)
Tags
Stats
Related papers
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Learning And Planning In Average-reward Markov Decision Processes (2020)0.00
- Learning Markov State Abstractions For Deep Reinforcement Learning (2021)0.00
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- A Policy Gradient Approach For Finite Horizon Constrained Markov Decision Processes (2022)3.58
- Reinforcement Learning: A Comparison Of UCB Versus Alternative Adaptive Policies (2019)0.00
- Bridging State And History Representations: Understanding Self-predictive RL (2024)0.00
- Reinforcement Learning With Unbiased Policy Evaluation And Linear Function Approximation (2022)0.00