Llama-3

Emerging

12papers using it

2025first seen

'LLaMA3' is a benchmark used to evaluate the multi-token prediction capabilities of large language models, demonstrating improvements in acceptance length and token throughput through a training-free probing approach.

🔎 Find this dataset

Papers using Llama-3 (12)

Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing2026

Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels2025

IntraSlice: Towards High-Performance Structural Pruning with Block-Intra PCA for LLMs2026

Compressing LLMs with MoP: Mixture of Pruners2026

Leveraging KV Similarity for Online Structured Pruning in LLMs2025

NIRVANA: Structured pruning reimagined for large language models compression2025

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters2025

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs2025

Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models2025

Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring2025

Beyond One-Size-Fits-All Pruning via Evolutionary Metric Search for Large Language Models2025

Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities2025