HH-RLHF

Name: HH-RLHF
License: mit

Emerging

3papers using it

36,917HF downloads

1,835HF likes

2024first seen

Dataset Card for HH-RLHF Dataset Summary This repository provides access to two different kinds of data: Human preference data about helpfulness and harmlessness from Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. These data are meant to train preference (or reward) models fo

🤗 Hugging Face⚖ mit

Papers using HH-RLHF (3)

Reward Shaping to Mitigate Reward Hacking in RLHF2025

HelpSteer2: Open-source dataset for training top-performing reward models2024 · 2 cites

Self-Evolved Reward Learning for LLMs2024 · 1 cites