Anthropic HH-RLHF
Emerging4papers using it
20HF downloads
4HF likes
2025first seen
The 'Anthropic HH-RLHF' dataset/benchmark contains human feedback data used to evaluate reinforcement learning models trained with a focus on privacy-preserving techniques.
Papers using Anthropic HH-RLHF (4)
- Privacy-preserving Reinforcement Learning From Human Feedback Via Decoupled Reward ModelingEfficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art PerformancePrivacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward ModelingBeyond Importance Sampling: Rejection-Gated Policy Optimization