Transfer Q-learning
2022 Β· Elynn Chen, Sai Li, Michael I. Jordan
Abstract
Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online \(Q\)-learning, integrating valuable insights from offline source studies. The proposed transfer \(Q\)-learning algorithm contains a novel \{\em re-targeting\} step that enables \{\em cross-stage transfer\} along multiple stages in an RL task, besides the usual \{\
Authors
(none)
Tags
Stats
Related papers
- Deep Transfer \(q\)-learning For Offline Non-stationary Reinforcement Learning (2025)0.00
- On The Transferability Of Deep-q Networks (2021)0.00
- Provably Efficient Multi-task Reinforcement Learning With Model Transfer (2021)0.00
- Target Transfer Q-learning And Its Convergence Analysis (2018)0.00
- Reinforcement Learning In The Wild With Maximum Likelihood-based Model Transfer (2023)0.00
- Lipschitz Lifelong Reinforcement Learning (2020)8.35
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00
- IOB: Integrating Optimization Transfer And Behavior Transfer For Multi-policy Reuse (2023)5.24