← all datasets

TruthfulQA

Canonical

27papers using it

1,860HF downloads

49HF likes

2024first seen

Dataset Card for TruthfulQA Dataset Summary TruthfulQA: Measuring How Models Mimic Human Falsehoods We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We cr

🤗 Hugging Face⚖ apache-2.0

Papers using TruthfulQA (27)

MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models2026

SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation2026

CausalGaze: Unveiling Hallucinations via Counterfactual Graph Intervention in Large Language Models2026

DeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context Learning2026

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval2026

ROAST: Rollout-based On-distribution Activation Steering Technique2026

Dr.LLM: Dynamic Layer Routing in LLMs2025

KatotohananQA: Evaluating Truthfulness of Large Language Models in Filipino2025

Too Helpful, Too Harmless, Too Honest or Just Right?2025

Hallucination Detection with the Internal Layers of LLMs2025

We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong2025

Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models2025

Steering When Necessary: Flexible Steering Large Language Models with Backtracking2025

GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs2025

MALM: A Multi-Information Adapter for Large Language Models to Mitigate Hallucination2025

Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models2025

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models2025

Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future2025

Steering When Necessary: Flexible Steering Large Language Models with Backtracking2025

Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts2025

When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR)2025

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment2025

Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs2024 · 2 cites

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts2024 · 1 cites

Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering2024 · 1 cites

Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains2024

Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation2024

TruthfulQA — datasets — llm-papers