Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery And Algebraic Equilibrium Proof
2024 Β· Yangchun Zhang, Qiang Liu, Weiming Li, et al.
Abstract
Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibriu
Authors
(none)
Tags
Stats
Related papers
- Learning Robust Rewards With Adversarial Inverse Reinforcement Learning (2017)0.00
- Non-adversarial Imitation Learning And Its Connections To Adversarial Methods (2020)0.00
- On Discovering Algorithms For Adversarial Imitation Learning (2025)0.00
- Scalable Multi-agent Inverse Reinforcement Learning Via Actor-attention-critic (2020)0.00
- Discriminator-actor-critic: Addressing Sample Inefficiency And Reward Bias In Adversarial Imitation Learning (2018)0.00
- ARC - Actor Residual Critic For Adversarial Imitation Learning (2022)0.00
- Imitating Opponent To Win: Adversarial Policy Imitation Learning In Two-player Competitive Games (2022)0.00
- Co-adaptation Of Algorithmic And Implementational Innovations In Inference-based Deep Reinforcement Learning (2021)0.00