Anthropic HH-RLHF

Emerging

4papers using it

20HF downloads

4HF likes

2025first seen

The 'Anthropic HH-RLHF' dataset/benchmark contains human feedback data used to evaluate reinforcement learning models trained with a focus on privacy-preserving techniques.

🤗 Hugging Face

Papers using Anthropic HH-RLHF (4)

Privacy-preserving Reinforcement Learning From Human Feedback Via Decoupled Reward Modeling2026

Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance2025

Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling2026

Beyond Importance Sampling: Rejection-Gated Policy Optimization2026