TruthfulQA
Canonical27papers using it
1,860HF downloads
49HF likes
2024first seen
Dataset Card for TruthfulQA Dataset Summary TruthfulQA: Measuring How Models Mimic Human Falsehoods We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We cr
π€ Hugging Faceβ apache-2.0
Papers using TruthfulQA (27)
- MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language ModelsSERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented GenerationCausalGaze: Unveiling Hallucinations via Counterfactual Graph Intervention in Large Language ModelsDeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context LearningMitigating LLM Hallucinations through Domain-Grounded Tiered RetrievalROAST: Rollout-based On-distribution Activation Steering TechniqueDr.LLM: Dynamic Layer Routing in LLMsKatotohananQA: Evaluating Truthfulness of Large Language Models in FilipinoToo Helpful, Too Harmless, Too Honest or Just Right?Hallucination Detection with the Internal Layers of LLMsWe Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go WrongCounterfactual Probing for Hallucination Detection and Mitigation in Large Language ModelsSteering When Necessary: Flexible Steering Large Language Models with BacktrackingGrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMsMALM: A Multi-Information Adapter for Large Language Models to Mitigate HallucinationSelective Self-to-Supervised Fine-Tuning for Generalization in Large
Language ModelsSample, Don't Search: Rethinking Test-Time Alignment for Language ModelsTemporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via
Past-FutureSteering When Necessary: Flexible Steering Large Language Models with
BacktrackingTest-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive
ExpertsWhen Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing
a Confidence-Weighted Persuasion Override Rate (CW-POR)More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety AlignmentTeuken-7B-Base & Teuken-7B-Instruct: Towards European LLMsBenchmark Inflation: Revealing LLM Performance Gaps Using Retro-HoldoutsEvaluating Consistencies in LLM responses through a Semantic Clustering
of Question AnsweringMaintaining Informative Coherence: Migrating Hallucinations in Large
Language Models via Absorbing Markov ChainsMitigating Adversarial Attacks in LLMs through Defensive Suffix
Generation