Llama-3
Emerging12papers using it
2025first seen
'LLaMA3' is a benchmark used to evaluate the multi-token prediction capabilities of large language models, demonstrating improvements in acceptance length and token throughput through a training-free probing approach.
Papers using Llama-3 (12)
- Efficient Training-Free Multi-Token Prediction via Embedding-Space ProbingAnalyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter LevelsIntraSlice: Towards High-Performance Structural Pruning with Block-Intra PCA for LLMsCompressing LLMs with MoP: Mixture of PrunersLeveraging KV Similarity for Online Structured Pruning in LLMsNIRVANA: Structured pruning reimagined for large language models compressionPRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday
Home ClustersUniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMsPrecision Where It Matters: A Novel Spike Aware Mixed-Precision
Quantization Strategy for LLaMA-based Language ModelsBridging the LLM Accessibility Divide? Performance, Fairness, and Cost
of Closed versus Open LLMs for Automated Essay ScoringBeyond One-Size-Fits-All Pruning via Evolutionary Metric Search for Large Language ModelsImproving Influence-based Instruction Tuning Data Selection for Balanced
Learning of Diverse Capabilities