GPQA
Emerging25papers using it
114,728HF downloads
461HF likes
2025first seen
Dataset Card for GPQA GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with ful
π€ Hugging Faceβ cc-by-4.0
Papers using GPQA (25)
- The Evolution of Thought: Tracking LLM Overthinking via Reasoning Dynamics AnalysisRethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity AsymmetryEvaluating and Mitigating LLM-as-a-judge Bias in Communication SystemsFocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-TuningBenchmark Illusion: Disagreement Among Llms And Its Scientific ConsequencesBudget-Aware Anytime Reasoning with LLM-Synthesized Preference DataDr.LLM: Dynamic Layer Routing in LLMsAutoBench: Automating LLM Evaluation through Reciprocal Peer AssessmentEAPO: Enhancing Policy Optimization with On-Demand Expert AssistanceCAC-CoT: Connector-Aware Compact Chain-of-Thought for Efficient Reasoning Data Synthesis Across Dual-System Cognitive TasksP3: Prompts Promote PromptingEfficient Model Development through Fine-tuning TransferMCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree
SearchGeneral-Reasoner: Advancing LLM Reasoning Across All DomainsInterleaved Reasoning for Large Language Models via Reinforcement
LearningEnigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable PuzzlesReinforcing General Reasoning without VerifiersInference-Time Hyper-Scaling with KV Cache CompressionFractional Reasoning via Latent Steering Vectors Improves Inference Time
ComputeFrom Harm to Help: Turning Reasoning In-Context Demos into Assets for
Reasoning LMsDeepPrune: Parallel Scaling without Inter-trace RedundancyReasoning with Sampling: Your Base Model is Smarter Than You ThinkCan LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM ReasoningINFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge ProfilingDancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural
Language Self-Critique