Expert Q-learning: Deep Reinforcement Learning With Coarse State Values From Offline Expert Examples
2021 Β· Li Meng, Anis Yazidi, Morten Goodwin, et al.
Abstract
In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm exhibits unstable and suboptimal behavior in non-deterministic settings, whereas Expert Q-learning demons
Authors
(none)
Tags
Stats
Related papers
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- Online Matching Via Reinforcement Learning: An Expert Policy Orchestration Strategy (2025)0.00
- Expert Or Not? Assessing Data Quality In Offline Reinforcement Learning (2025)0.00
- Model-based Offline Reinforcement Learning With Lower Expectile Q-learning (2024)0.00
- A Perspective Of Q-value Estimation On Offline-to-online Reinforcement Learning (2023)7.81
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- FAST-Q: Fast-track Exploration With Adversarially Balanced State Representations For Counterfactual Action Estimation In Offline Reinforcement Learning (2025)0.00