Dynamic Memory For Interpretable Sequential Optimisation
2022 Β· Srivas Chennu, Andrew Maher, Jamie Martin, et al.
Abstract
Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stationarity that is suitable for deployment at scale, to provide business operators with automated adaptive optimisation. Our solution aims to provide interpretable learning that can be trusted by humans, whilst responding to non-stationarity to minimise regret. To this end, we develop an adaptive Bayesian learning agent that employs a novel form of dynamic memory. It enables interpretability through statistical hypothesis testing, by targeting a set point of statistical power when comparing rewards and adjusting its memory dynamically to achieve this power. By design, the agent is agnostic to dif
Authors
(none)
Tags
Stats
Related papers
- Stable Hadamard Memory: Revitalizing Memory-augmented Agents For Reinforcement Learning (2024)0.00
- Adapting Behaviour For Learning Progress (2019)0.00
- Adamemento: Adaptive Memory-assisted Policy Optimization For Reinforcement Learning (2024)0.00
- Meta-trained Agents Implement Bayes-optimal Agents (2020)0.00
- Augmented Replay Memory In Reinforcement Learning With Continuous Control (2019)5.24
- Learning, Fast And Slow: A Goal-directed Memory-based Approach For Dynamic Environments (2023)0.00
- Non-stationary Reinforcement Learning: The Blessing Of (more) Optimism (2019)0.00
- Lifelong Reinforcement Learning Via Neuromodulation (2024)0.00