Expert Or Not? Assessing Data Quality In Offline Reinforcement Learning
2025 Β· Arip Asadulaev, Fakhri Karray, Martin Takac
Abstract
Offline reinforcement learning (RL) learns exclusively from static datasets, without further interaction with the environment. In practice, such datasets vary widely in quality, often mixing expert, suboptimal, and even random trajectories. The choice of algorithm therefore depends on dataset fidelity. Behavior cloning can suffice on high-quality data, whereas mixed- or low-quality data typically benefits from offline RL methods that stitch useful behavior across trajectories. Yet in the wild it is difficult to assess dataset quality a priori because the data's provenance and skill composition are unknown. We address the problem of estimating offline dataset quality without training an agent. We study a spectrum of proxies from simple cumulative rewards to learned value based estimators, and introduce the Bellman Wasserstein distance (BWD), a value aware optimal transport score that measures how dissimilar a dataset's behavioral policy is from a random reference policy. BWD is computed
Authors
(none)
Tags
Stats
Related papers
- Measuring Data Quality For Dataset Selection In Offline Reinforcement Learning (2021)0.00
- Data Valuation For Offline Reinforcement Learning (2022)0.00
- When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? (2022)0.00
- Interpretable Performance Analysis Towards Offline Reinforcement Learning: A Dataset Perspective (2021)0.00
- Optimality Inductive Biases And Agnostic Guidelines For Offline Reinforcement Learning (2021)0.00
- Is Value Learning Really The Main Bottleneck In Offline RL? (2024)0.00
- Behavior Estimation From Multi-source Data For Offline Reinforcement Learning (2022)2.26
- A Dataset Perspective On Offline Reinforcement Learning (2021)0.00