FAST-Q: Fast-track Exploration With Adversarially Balanced State Representations For Counterfactual Action Estimation In Offline Reinforcement Learning
2025 Β· Pulkit Agrawal, Rukma Talwadker, Aditya Pareek, et al.
Abstract
Recent advancements in state-of-the-art (SOTA) offline reinforcement learning (RL) have primarily focused on addressing function approximation errors, which contribute to the overestimation of Q-values for out-of-distribution actions, a challenge that static datasets exacerbate. However, high stakes applications such as recommendation systems in online gaming, introduce further complexities due to player's psychology (intent) driven by gameplay experiences and the inherent volatility on the platform. These factors create highly sparse, partially overlapping state spaces across policies, further influenced by the experiment path selection logic which biases state spaces towards specific policies. Current SOTA methods constrain learning from such offline data by clipping known counterfactual actions as out-of-distribution due to poor generalization across unobserved states. Further aggravating conservative Q-learning and necessitating more online exploration. FAST-Q introduces a novel ap
Authors
(none)
Tags
Stats
Related papers
- An Investigation Of Offline Reinforcement Learning In Factorisable Action Spaces (2024)0.00
- Budgeting Counterfactual For Offline RL (2023)0.00
- State-constrained Offline Reinforcement Learning (2024)0.00
- Towards Fast Safe Online Reinforcement Learning Via Policy Finetuning (2024)0.00
- Counterfactual Conservative Q Learning For Offline Multi-agent Reinforcement Learning (2023)0.00
- Leveraging Factored Action Spaces For Efficient Offline Reinforcement Learning In Healthcare (2023)2.26
- Exploiting Action Impact Regularity And Exogenous State Variables For Offline Reinforcement Learning (2021)0.00
- Offline Fictitious Self-play For Competitive Games (2024)0.00