Delayed Geometric Discounts: An Alternative Criterion For Reinforcement Learning
2022 Β· Firas Jarboui, Ahmed Akakzia
Abstract
The endeavor of artificial intelligence (AI) is to design autonomous agents capable of achieving complex tasks. Namely, reinforcement learning (RL) proposes a theoretical background to learn optimal behaviors. In practice, RL algorithms rely on geometric discounts to evaluate this optimality. Unfortunately, this does not cover decision processes where future returns are not exponentially less valuable. Depending on the problem, this limitation induces sample-inefficiency (as feed-backs are exponentially decayed) and requires additional curricula/exploration mechanisms (to deal with sparse, deceptive or adversarial rewards). In this paper, we tackle these issues by generalizing the discounted problem formulation with a family of delayed objective functions. We investigate the underlying RL problem to derive: 1) the optimal stationary solution and 2) an approximation of the optimal non-stationary control. The devised algorithms solved hard exploration problems on tabular environment and
Authors
(none)
Tags
Stats
Related papers
- Rethinking The Discount Factor In Reinforcement Learning: A Decision Theoretic Approach (2019)8.60
- Examining Average And Discounted Reward Optimality Criteria In Reinforcement Learning (2021)0.00
- ACERAC: Efficient Reinforcement Learning In Fine Time Discretization (2021)4.52
- Analyzing And Bridging The Gap Between Maximizing Total Reward And Discounted Reward In Deep Reinforcement Learning (2024)0.00
- Learning Fair Policies In Multiobjective (deep) Reinforcement Learning With Average And Discounted Rewards (2020)0.00
- Model-agnostic Solutions For Deep Reinforcement Learning In Non-ergodic Contexts (2026)0.00
- Regret Bounds For Discounted Mdps (2020)0.00
- Unified Algorithms For RL With Decision-estimation Coefficients: PAC, Reward-free, Preference-based Learning, And Beyond (2022)5.24