Finite-sample Bounds For Adaptive Inverse Reinforcement Learning Using Passive Langevin Dynamics
2023 Β· Luke Snow, Vikram Krishnamurthy
Abstract
This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algorithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner's cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function
Authors
(none)
Tags
Stats
Related papers
- Langevin Dynamics For Adaptive Inverse Reinforcement Learning Of Stochastic Gradient Algorithms (2020)0.00
- Inverse Reinforcement Learning With Simultaneous Estimation Of Rewards And Dynamics (2016)0.00
- Maximum-likelihood Inverse Reinforcement Learning With Finite-time Guarantees (2022)0.00
- Towards Theoretical Understanding Of Inverse Reinforcement Learning (2023)0.00
- Active Learning For Risk-sensitive Inverse Reinforcement Learning (2019)0.00
- Stabilizing Policy Gradients For Sample-efficient Reinforcement Learning In LLM Reasoning (2025)0.00
- Direct Soft-policy Sampling Via Langevin Dynamics (2026)0.00
- Sample Efficient Active Algorithms For Offline Reinforcement Learning (2026)0.00