Awesome Efficiency
Efficiency is one of the most active areas in Awesome LLM Papers β 5,716 papers in this collection, evaluated on datasets like GSM8K, MMLU, MATH-500. A strong starting point is "Lora: Low-rank Adaptation Of Large Language Models".
Datasets & benchmarks
Key papers
- Lora: Low-rank Adaptation Of Large Language Models (2021)Edward J. Hu, Yelong Shen, Phillip Wallis, et al.48.60
- Efficient Memory Management For Large Language Model Serving With Pagedattention (2023)Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, et al.46.56
- Efficient Streaming Language Models With Attention Sinks (2023)Guangxuan Xiao, Yuandong Tian, Beidi Chen, et al.37.76
- Llama: Open And Efficient Foundation Language Models (2023)Hugo Touvron, Thibaut Lavril, Gautier Izacard, et al.36.83
- Qlora: Efficient Finetuning Of Quantized Llms (2023)Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, et al.36.23
- Minicpm: Unveiling The Potential Of Small Language Models With Scalable Training Strategies (2024)Shengding Hu, Yuge Tu, Xu Han, et al.36.01
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)BigScience Workshop: Teven Le Scao, Angela Fan, Christopher Akiki, et al.33.79
- Layerskip: Enabling Early Exit Inference And Self-speculative Decoding (2024)Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, et al.32.46
- Prefix-tuning: Optimizing Continuous Prompts For Generation (2021)Xiang Lisa Li, Percy Liang32.38
- Omniquant: Omnidirectionally Calibrated Quantization For Large Language Models (2023)Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, et al.31.92
- Judgelm: Fine-tuned Large Language Models Are Scalable Judges (2023)Lianghui Zhu, Xinggang Wang, Xinlong Wang31.75
- Training Compute-optimal Large Language Models (2022)Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.31.52
- Deepseek LLM: Scaling Open-source Language Models With Longtermism (2024)Deepseek-Ai, :, Xiao Bi, et al.30.12
- Medusa: Simple LLM Inference Acceleration Framework With Multiple Decoding Heads (2024)Tianle Cai, Yuhong Li, Zhengyang Geng, et al.30.03
- LLM In A Flash: Efficient Large Language Model Inference With Limited Memory (2023)Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, et al.29.72
- Agentverse: Facilitating Multi-agent Collaboration And Exploring Emergent Behaviors (2023)Weize Chen, Yusheng Su, Jingwei Zuo, et al.29.70
- Deja Vu: Contextual Sparsity For Efficient Llms At Inference Time (2023)Zichang Liu, Jue Wang, Tri Dao, et al.29.70
- Agentscope: A Flexible Yet Robust Multi-agent Platform (2024)Dawei Gao, Zitao Li, Xuchen Pan, et al.29.54
- Emergent Abilities Of Large Language Models (2022)Jason Wei, Yi Tay, Rishi Bommasani, et al.29.54
- Qa-lora: Quantization-aware Low-rank Adaptation Of Large Language Models (2023)Yuhui Xu, Lingxi Xie, Xiaotao Gu, et al.29.52
- Scaling Synthetic Data Creation With 1,000,000,000 Personas (2024)Tao Ge, Xin Chan, Xiaoyang Wang, et al.29.19
- Autogen: Enabling Next-gen LLM Applications Via Multi-agent Conversation (2023)Qingyun Wu, Gagan Bansal, Jieyu Zhang, et al.29.16
- Scaling Laws With Vocabulary: Larger Models Deserve Larger Vocabularies (2024)Chaofan Tao, Qian Liu, Longxu Dou, et al.28.35
- Mobilellm: Optimizing Sub-billion Parameter Language Models For On-device Use Cases (2024)Zechun Liu, Changsheng Zhao, Forrest Iandola, et al.28.23
- Lm-infinite: Zero-shot Extreme Length Generalization For Large Language Models (2023)Chi Han, Qifan Wang, Hao Peng, et al.27.86
- Buffer Of Thoughts: Thought-augmented Reasoning With Large Language Models (2024)Ling Yang, Zhaochen Yu, Tianjun Zhang, et al.27.84
- Shortgpt: Layers In Large Language Models Are More Redundant Than You Expect (2024)Xin Men, Mingyu Xu, Qingyu Zhang, et al.27.72
- Duoattention: Efficient Long-context LLM Inference With Retrieval And Streaming Heads (2024)Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, et al.27.44
- EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction (2024)Siyu Yuan, Kaitao Song, Jiangjie Chen, et al.27.38
- Efficient Large Language Models: A Survey (2023)Zhongwei Wan, Xin Wang, Che Liu, et al.27.30
- Leave No Context Behind: Efficient Infinite Context Transformers With Infini-attention (2024)Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal27.29
- Flashrag: A Modular Toolkit For Efficient Retrieval-augmented Generation Research (2024)Jiajie Jin, Yutao Zhu, Guanting Dong, et al.26.95
- Culturax: A Cleaned, Enormous, And Multilingual Dataset For Large Language Models In 167 Languages (2023)Thuat Nguyen, Chien van Nguyen, Viet Dac Lai, et al.26.68
- SOLAR 10.7B: Scaling Large Language Models With Simple Yet Effective Depth Up-scaling (2023)Dahyun Kim, Chanjun Park, Sanghoon Kim, et al.26.34
- Triforce: Lossless Acceleration Of Long Sequence Generation With Hierarchical Speculative Decoding (2024)Hanshi Sun, Zhuoming Chen, Xinyu Yang, et al.26.11
- Minference 1.0: Accelerating Pre-filling For Long-context Llms Via Dynamic Sparse Attention (2024)Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, et al.25.85
- SVD-LLM: Truncation-aware Singular Value Decomposition For Large Language Model Compression (2024)Xin Wang, Yu Zheng, Zhongwei Wan, et al.25.83
- Memgpt: Towards Llms As Operating Systems (2023)Charles Packer, Sarah Wooders, Kevin Lin, et al.25.76
- Babilong: Testing The Limits Of Llms With Long Context Reasoning-in-a-haystack (2024)Yuri Kuratov, Aydar Bulatov, Petr Anokhin, et al.25.74
- Scaling Retrieval-based Language Models With A Trillion-token Datastore (2024)Rulin Shao, Jacqueline He, Akari Asai, et al.25.71
- Evoprompt: Connecting Llms With Evolutionary Algorithms Yields Powerful Prompt Optimizers (2023)Qingyan Guo, Rui Wang, Junliang Guo, et al.25.70
- Infllm: Training-free Long-context Extrapolation For Llms With An Efficient Context Memory (2024)Chaojun Xiao, Pengle Zhang, Xu Han, et al.25.56
- Cacheblend: Fast Large Language Model Serving For RAG With Cached Knowledge Fusion (2024)Jiayi Yao, Hanchen Li, Yuhan Liu, et al.25.46
- When Scaling Meets LLM Finetuning: The Effect Of Data, Model And Finetuning Method (2024)Biao Zhang, Zhongtao Liu, Colin Cherry, et al.25.34
- Controlllm: Augment Language Models With Tools By Searching On Graphs (2023)Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, et al.24.66
- MA-LMM: Memory-augmented Large Multimodal Model For Long-term Video Understanding (2024)Bo He, Hengduo Li, Young Kyun Jang, et al.24.50
- Shortened Llama: Depth Pruning For Large Language Models With Comparison Of Retraining Methods (2024)Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, et al.24.29
- Federatedscope-llm: A Comprehensive Package For Fine-tuning Large Language Models In Federated Learning (2023)Weirui Kuang, Bingchen Qian, Zitao Li, et al.24.26
- Compact Language Models Via Pruning And Knowledge Distillation (2024)Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, et al.24.15
- Platypus: Quick, Cheap, And Powerful Refinement Of Llms (2023)Ariel N. Lee, Cole J. Hunter, Nataniel Ruiz24.03
- Atom: Low-bit Quantization For Efficient And Accurate LLM Serving (2023)Yilong Zhao, Chien-Yu Lin, Kan Zhu, et al.23.81
- Black-box Prompt Optimization: Aligning Large Language Models Without Model Training (2023)Jiale Cheng, Xiao Liu, Kehan Zheng, et al.23.78
- Routerbench: A Benchmark For Multi-llm Routing System (2024)Qitian Jason Hu, Jacob Bieker, Xiuyu Li, et al.23.73
- Tinybenchmarks: Evaluating Llms With Fewer Examples (2024)Felipe Maia Polo, Lucas Weber, Leshem Choshen, et al.23.68
- Simple And Scalable Strategies To Continually Pre-train Large Language Models (2024)Adam Ibrahim, Benjamin ThΓ©rien, Kshitij Gupta, et al.23.62
- Discovering The Gems In Early Layers: Accelerating Long-context Llms With 1000x Input Token Reduction (2024)Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, et al.23.61
- Optimize Weight Rounding Via Signed Gradient Descent For The Quantization Of Llms (2023)Wenhua Cheng, Weiwei Zhang, Haihao Shen, et al.23.56
- Cobra: Extending Mamba To Multi-modal Large Language Model For Efficient Inference (2024)Han Zhao, Min Zhang, Wei Zhao, et al.23.43
- Think: Thinner Key Cache By Query-driven Pruning (2024)Yuhui Xu, Zhanming Jie, Hanze Dong, et al.23.34
- GEAR: An Efficient KV Cache Compression Recipe For Near-lossless Generative Inference Of LLM (2024)Hao Kang, Qingru Zhang, Souvik Kundu, et al.23.22