Policy Gradient
50 papers tagged Policy Gradient (ordered by heat_score)
Papers
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)Jingliang Duan, Yang Guan, Shengbo Eben Li, et al.17.77
- Vulnerability Of Deep Reinforcement Learning To Policy Induction Attacks (2017)Vahid Behzadan, Arslan Munir15.98
- Communication-efficient Policy Gradient Methods For Distributed Reinforcement Learning (2018)Tianyi Chen, Kaiqing Zhang, Georgios B. Giannakis, et al.13.05
- Improving Coordination In Small-scale Multi-agent Deep Reinforcement Learning Through Memory-driven Communication (2019)Emanuele Pesce, Giovanni Montana12.25
- Diversity Policy Gradient For Sample Efficient Quality-diversity Optimization (2020)Thomas Pierrot, Valentin MacÉ, Félix Chalumeau, et al.11.58
- On The Sample Complexity Of Actor-critic Method For Reinforcement Learning With Function Approximation (2019)Harshat Kumar, Alec Koppel, Alejandro Ribeiro11.49
- A Multi-agent Off-policy Actor-critic Algorithm For Distributed Reinforcement Learning (2019)Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, et al.11.39
- Soft Policy Gradient Method For Maximum Entropy Deep Reinforcement Learning (2019)Wenjie Shi, Shiji Song, Cheng Wu10.85
- Direct And Indirect Reinforcement Learning (2019)Yang Guan, Shengbo Eben Li, Jingliang Duan, et al.10.74
- Dual Policy Distillation (2020)Kwei-Herng Lai, Daochen Zha, Yuening Li, et al.10.61
- Variance Reduction In Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) For Extensive Form Games Using Baselines (2018)Martin Schmid, Neil Burch, Marc Lanctot, et al.10.48
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)Samir Wadhwania, Dong-Ki Kim, Shayegan Omidshafiei, et al.10.48
- Cooperative Multi-agent Reinforcement Learning With Partial Observations (2020)Yan Zhang, Michael M. Zavlanos10.35
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)Qiang He, Xinwen Hou10.21
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, et al.9.60
- Qualitative Measurements Of Policy Discrepancy For Return-based Deep Q-network (2018)Wenjia Meng, Qian Zheng, Long Yang, et al.9.59
- The Sufficiency Of Off-policyness And Soft Clipping: PPO Is Still Insufficient According To An Off-policy Measure (2022)Xing Chen, Dongcui Diao, Hechang Chen, et al.9.23
- Convergence Of Policy Gradient Methods For Finite-horizon Exploratory Linear-quadratic Control Problems (2022)Michael Giegrich, Christoph Reisinger, Yufei Zhang9.23
- Compatible Natural Gradient Policy Search (2019)Joni Pajarinen, Hong Linh Thai, Riad Akrour, et al.9.23
- Convergence Guarantees Of Policy Optimization Methods For Markovian Jump Linear Systems (2020)Joao Paulo Jansch-Porto, Bin Hu, Geir Dullerud9.03
- Fully Asynchronous Policy Evaluation In Distributed Reinforcement Learning Over Networks (2020)Xingyu Sha, Jiaqi Zhang, Keyou You, et al.9.03
- Simple And Optimal Methods For Stochastic Variational Inequalities, II: Markovian Noise And Policy Evaluation In Reinforcement Learning (2020)Georgios Kotsalis, Guanghui Lan, Tianjiao Li8.60
- Learning First-to-spike Policies For Neuromorphic Control Using Policy Gradients (2018)Bleema Rosenfeld, Osvaldo Simeone, Bipin Rajendran8.60
- Distributed Value Function Approximation For Collaborative Multi-agent Reinforcement Learning (2020)Milos S. Stankovic, Marko Beko, Srdjan S. Stankovic8.60
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)Xiangyuan Zhang, Tamer Başar8.60
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)Muhammad A. Masood, Finale Doshi-Velez8.35
- Novelty Search For Deep Reinforcement Learning Policy Network Weights By Action Sequence Edit Metric Distance (2019)Ethan C. Jackson, Mark Daley8.09
- MULTIPOLAR: Multi-source Policy Aggregation For Transfer Reinforcement Learning Between Diverse Environmental Dynamics (2019)Mohammadamin Barekatain, Ryo Yonetani, Masashi Hamaya7.81
- Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning (2020)Lionel Blondé, Pablo Strasser, Alexandros Kalousis7.81
- Neural Temporal-difference And Q-learning Provably Converge To Global Optima (2019)Qi Cai, Zhuoran Yang, Jason D. Lee, et al.7.81
- Addressing Action Oscillations Through Learning Policy Inertia (2021)Chen Chen, Hongyao Tang, Jianye Hao, et al.7.81
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)Yuchen Xiao, Xueguang Lyu, Christopher Amato7.81
- Smoothing Policies And Safe Policy Gradients (2019)Matteo Papini, Matteo Pirotta, Marcello Restelli7.50
- Policy Optimization With Stochastic Mirror Descent (2019)Long Yang, Yu Zhang, Gang Zheng, et al.7.50
- Quantum Natural Policy Gradients: Towards Sample-efficient Reinforcement Learning (2023)Nico Meyer, Daniel D. Scherer, Axel Plinge, et al.7.16
- Gradient-aware Model-based Policy Search (2019)Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, et al.6.77
- Faded-experience Trust Region Policy Optimization For Model-free Power Allocation In Interference Channel (2020)Mohammad G. Khoshkholgh, Halim Yanikomeroglu6.77
- Reinforcement Learning In Nonzero-sum Linear Quadratic Deep Structured Games: Global Convergence Of Policy Optimization (2020)Masoud Roudneshin, Jalal Arabneydi, Amir G. Aghdam6.77
- Joint Optimization Of Multi-objective Reinforcement Learning With Policy Gradient Based Algorithm (2021)Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal6.34
- Linear Convergence Of Entropy-regularized Natural Policy Gradient With Linear Function Approximation (2021)Semih Cayci, Niao He, R. Srikant6.34
- Softmax Policy Gradient Methods Can Take Exponential Time To Converge (2021)Gen Li, Yuting Wei, Yuejie Chi, et al.6.34
- An Efficient Off-policy Reinforcement Learning Algorithm For The Continuous-time LQR Problem (2023)Victor G. Lopez, Matthias A. Müller6.34
- On The Role Of Weight Sharing During Deep Option Learning (2019)Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, et al.6.34
- Exploiting The Sign Of The Advantage Function To Learn Deterministic Policies In Continuous Domains (2019)Matthieu Zimmer, Paul Weng6.34
- Rethinking Adversarial Attacks In Reinforcement Learning From Policy Distribution Perspective (2025)Tianyang Duan, Zongyuan Zhang, Zheng Lin, et al.5.84
- Reinforcement Learning In Linear Quadratic Deep Structured Teams: Global Convergence Of Policy Gradient Methods (2020)Vida Fathi, Jalal Arabneydi, Amir G. Aghdam5.84
- A Further Exploration Of Deep Multi-agent Reinforcement Learning With Hybrid Action Space (2022)Hongzhi Hua, Guixuan Wen, Kaigui Wu5.84
- Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning With Polynomial Sample Complexity (2020)Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, et al.5.84
- Modelling The Dynamic Joint Policy Of Teammates With Attention Multi-agent DDPG (2018)Hangyu Mao, Zhengchao Zhang, Zhen Xiao, et al.5.84
- Clipup: A Simple And Powerful Optimizer For Distribution-based Policy Evolution (2020)Nihat Engin Toklu, Paweł Liskowski, Rupesh Kumar Srivastava5.84