Awesome Training Techniques
Training Techniques is one of the most active areas in Awesome LLM Papers β 7,570 papers in this collection, evaluated on datasets like GSM8K, MMLU, MATH-500. A strong starting point is "Lora: Low-rank Adaptation Of Large Language Models".
Datasets & benchmarks
Key papers
- Lora: Low-rank Adaptation Of Large Language Models (2021)Edward J. Hu, Yelong Shen, Phillip Wallis, et al.48.60
- Zephyr: Direct Distillation Of LM Alignment (2023)Lewis Tunstall, Edward Beeching, Nathan Lambert, et al.38.56
- Efficient Streaming Language Models With Attention Sinks (2023)Guangxuan Xiao, Yuandong Tian, Beidi Chen, et al.37.76
- Training Language Models To Follow Instructions With Human Feedback (2022)Long Ouyang, Jeff Wu, Xu Jiang, et al.36.92
- Llama: Open And Efficient Foundation Language Models (2023)Hugo Touvron, Thibaut Lavril, Gautier Izacard, et al.36.83
- Qlora: Efficient Finetuning Of Quantized Llms (2023)Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, et al.36.23
- Minicpm: Unveiling The Potential Of Small Language Models With Scalable Training Strategies (2024)Shengding Hu, Yuge Tu, Xu Han, et al.36.01
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)BigScience Workshop: Teven Le Scao, Angela Fan, Christopher Akiki, et al.33.79
- Agenttuning: Enabling Generalized Agent Abilities For Llms (2023)Aohan Zeng, Mingdao Liu, Rui Lu, et al.32.67
- Layerskip: Enabling Early Exit Inference And Self-speculative Decoding (2024)Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, et al.32.46
- Prefix-tuning: Optimizing Continuous Prompts For Generation (2021)Xiang Lisa Li, Percy Liang32.38
- Omniquant: Omnidirectionally Calibrated Quantization For Large Language Models (2023)Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, et al.31.92
- The Fineweb Datasets: Decanting The Web For The Finest Text Data At Scale (2024)Guilherme Penedo, Hynek KydlΓΔek, Loubna Ben Allal, et al.31.68
- Longbench: A Bilingual, Multitask Benchmark For Long Context Understanding (2023)Yushi Bai, Xin Lv, Jiajie Zhang, et al.31.59
- Training Compute-optimal Large Language Models (2022)Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.31.52
- SPHINX: The Joint Mixing Of Weights, Tasks, And Visual Embeddings For Multi-modal Large Language Models (2023)Ziyi Lin, Chris Liu, Renrui Zhang, et al.31.35
- Step-dpo: Step-wise Preference Optimization For Long-chain Reasoning Of Llms (2024)Xin Lai, Zhuotao Tian, Yukang Chen, et al.31.31
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025)DeepSeek-AI et al.31.23
- Deepseek LLM: Scaling Open-source Language Models With Longtermism (2024)Deepseek-Ai, :, Xiao Bi, et al.30.12
- Helpsteer2: Open-source Dataset For Training Top-performing Reward Models (2024)Zhilin Wang, Yi Dong, Olivier Delalleau, et al.29.68
- Qa-lora: Quantization-aware Low-rank Adaptation Of Large Language Models (2023)Yuhui Xu, Lingxi Xie, Xiaotao Gu, et al.29.52
- RLAIF Vs. RLHF: Scaling Reinforcement Learning From Human Feedback With AI Feedback (2023)Harrison Lee, Samrat Phatale, Hassan Mansoor, et al.29.41
- Scaling Laws With Vocabulary: Larger Models Deserve Larger Vocabularies (2024)Chaofan Tao, Qian Liu, Longxu Dou, et al.28.35
- Mobilellm: Optimizing Sub-billion Parameter Language Models For On-device Use Cases (2024)Zechun Liu, Changsheng Zhao, Forrest Iandola, et al.28.23
- Magpie: Alignment Data Synthesis From Scratch By Prompting Aligned Llms With Nothing (2024)Zhangchen Xu, Fengqing Jiang, Luyao Niu, et al.28.04
- Agentless: Demystifying Llm-based Software Engineering Agents (2024)Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, et al.27.93
- Lm-infinite: Zero-shot Extreme Length Generalization For Large Language Models (2023)Chi Han, Qifan Wang, Hao Peng, et al.27.86
- Ultrafeedback: Boosting Language Models With Scaled AI Feedback (2023)Ganqu Cui, Lifan Yuan, Ning Ding, et al.27.73
- Shortgpt: Layers In Large Language Models Are More Redundant Than You Expect (2024)Xin Men, Mingyu Xu, Qingyu Zhang, et al.27.72
- Contrastive Preference Optimization: Pushing The Boundaries Of LLM Performance In Machine Translation (2024)Haoran Xu, Amr Sharaf, Yunmo Chen, et al.27.70
- Is DPO Superior To PPO For LLM Alignment? A Comprehensive Study (2024)Shusheng Xu, Wei Fu, Jiaxuan Gao, et al.27.53
- Magicoder: Empowering Code Generation With Oss-instruct (2023)Yuxiang Wei, Zhe Wang, Jiawei Liu, et al.27.43
- The Reversal Curse: Llms Trained On "A Is B" Fail To Learn "B Is A" (2023)Lukas Berglund, Meg Tong, Max Kaufmann, et al.27.32
- Efficient Large Language Models: A Survey (2023)Zhongwei Wan, Xin Wang, Che Liu, et al.27.30
- Leave No Context Behind: Efficient Infinite Context Transformers With Infini-attention (2024)Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal27.29
- Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024)Zhiheng Xi, Yiwen Ding, Wenxiang Chen, et al.27.20
- Internlm2 Technical Report (2024)Zheng Cai, Maosong Cao, Haojiong Chen, et al.27.03
- Longalign: A Recipe For Long Context Alignment Of Large Language Models (2024)Yushi Bai, Xin Lv, Jiajie Zhang, et al.26.93
- Xlstm: Extended Long Short-term Memory (2024)Maximilian Beck, Korbinian PΓΆppel, Markus Spanring, et al.26.88
- Dola: Decoding By Contrasting Layers Improves Factuality In Large Language Models (2023)Yung-Sung Chuang, Yujia Xie, Hongyin Luo, et al.26.86
- Culturax: A Cleaned, Enormous, And Multilingual Dataset For Large Language Models In 167 Languages (2023)Thuat Nguyen, Chien van Nguyen, Viet Dac Lai, et al.26.68
- SOLAR 10.7B: Scaling Large Language Models With Simple Yet Effective Depth Up-scaling (2023)Dahyun Kim, Chanjun Park, Sanghoon Kim, et al.26.34
- Learning From Mistakes Makes LLM Better Reasoner (2023)Shengnan An, Zexiong Ma, Zeqi Lin, et al.26.19
- Saullm-7b: A Pioneering Large Language Model For Law (2024)Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, et al.26.00
- Mammoth: Building Math Generalist Models Through Hybrid Instruction Tuning (2023)Xiang Yue, Xingwei Qu, Ge Zhang, et al.25.94
- SVD-LLM: Truncation-aware Singular Value Decomposition For Large Language Model Compression (2024)Xin Wang, Yu Zheng, Zhongwei Wan, et al.25.83
- Direct Nash Optimization: Teaching Language Models To Self-improve With General Preferences (2024)Corby Rosset, Ching-An Cheng, Arindam Mitra, et al.25.73
- Scaling Retrieval-based Language Models With A Trillion-token Datastore (2024)Rulin Shao, Jacqueline He, Akari Asai, et al.25.71
- Evoprompt: Connecting Llms With Evolutionary Algorithms Yields Powerful Prompt Optimizers (2023)Qingyan Guo, Rui Wang, Junliang Guo, et al.25.70
- Infllm: Training-free Long-context Extrapolation For Llms With An Efficient Context Memory (2024)Chaojun Xiao, Pengle Zhang, Xu Han, et al.25.56
- Agent-flan: Designing Data And Methods Of Effective Agent Tuning For Large Language Models (2024)Zehui Chen, Kuikun Liu, Qiuchen Wang, et al.25.55
- Eagle: Exploring The Design Space For Multimodal Llms With Mixture Of Encoders (2024)Min Shi, Fuxiao Liu, Shihao Wang, et al.25.55
- When Scaling Meets LLM Finetuning: The Effect Of Data, Model And Finetuning Method (2024)Biao Zhang, Zhongtao Liu, Colin Cherry, et al.25.34
- Llamax: Scaling Linguistic Horizons Of LLM By Enhancing Translation Capabilities Beyond 100 Languages (2024)Yinquan Lu, Wenhao Zhu, Lei Li, et al.25.17
- Teaching Large Language Models To Reason With Reinforcement Learning (2024)Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, et al.25.08
- Agentohana: Design Unified Data And Training Pipeline For Effective Agent Learning (2024)Jianguo Zhang, Tian Lan, Rithesh Murthy, et al.24.90
- Toward Self-improvement Of Llms Via Imagination, Searching, And Criticizing (2024)Ye Tian, Baolin Peng, Linfeng Song, et al.24.80
- Continual Learning Of Large Language Models: A Comprehensive Survey (2024)Haizhou Shi, Zihao Xu, Hengyi Wang, et al.24.79
- Layoutllm: Layout Instruction Tuning With Large Language Models For Document Understanding (2024)Chuwei Luo, Yufan Shen, Zhaoqing Zhu, et al.24.66
- Reconcile: Round-table Conference Improves Reasoning Via Consensus Among Diverse Llms (2023)Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal24.61