Unified Algorithms For RL With Decision-estimation Coefficients: PAC, Reward-free, Preference-based Learning, And Beyond
2022 Β· Fan Chen, Song Mei, Yu Bai
Abstract
Modern Reinforcement Learning (RL) is more than just learning the optimal policy; Alternative learning goals such as exploring the environment, estimating the underlying model, and learning from preference feedback are all of practical importance. While provably sample-efficient algorithms for each specific goal have been proposed, these algorithms often depend strongly on the particular learning goal and thus admit different structures correspondingly. It is an urging open question whether these learning goals can rather be tackled by a single unified algorithm. We make progress on this question by developing a unified algorithm framework for a large class of learning goals, building on the Decision-Estimation Coefficient (DEC) framework. Our framework handles many learning goals such as no-regret RL, PAC RL, reward-free learning, model estimation, and preference-based learning, all by simply instantiating the same generic complexity measure called "Generalized DEC", and a correspon
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning With Algorithms From Probabilistic Structure Estimation (2021)0.00
- GEC: A Unified Framework For Interactive Decision Making In MDP, POMDP, And Beyond (2022)0.00
- Multi-objective Reward And Preference Optimization: Theory And Algorithms (2025)0.00
- Optimal Decision-making In Mixed-agent Partially Observable Stochastic Environments Via Reinforcement Learning (2019)0.00
- A Comprehensive Survey Of Reinforcement Learning: From Algorithms To Practical Challenges (2024)0.00
- Multi-agent Reinforcement Learning: A Selective Overview Of Theories And Algorithms (2019)21.85
- From Reinforcement Learning To Optimal Control: A Unified Framework For Sequential Decisions (2019)0.00
- Discovering Reinforcement Learning Algorithms (2020)0.00