Awesome Model Architecture
Model Architecture is one of the most active areas in Awesome LLM Papers β 4,984 papers in this collection, evaluated on datasets like GSM8K, MMLU, LongBench. A strong starting point is "Efficient Streaming Language Models With Attention Sinks".
Datasets & benchmarks
Key papers
- Efficient Streaming Language Models With Attention Sinks (2023)Guangxuan Xiao, Yuandong Tian, Beidi Chen, et al.37.76
- Llama: Open And Efficient Foundation Language Models (2023)Hugo Touvron, Thibaut Lavril, Gautier Izacard, et al.36.83
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)BigScience Workshop: Teven Le Scao, Angela Fan, Christopher Akiki, et al.33.79
- Layerskip: Enabling Early Exit Inference And Self-speculative Decoding (2024)Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, et al.32.46
- Omniquant: Omnidirectionally Calibrated Quantization For Large Language Models (2023)Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, et al.31.92
- Longbench: A Bilingual, Multitask Benchmark For Long Context Understanding (2023)Yushi Bai, Xin Lv, Jiajie Zhang, et al.31.59
- Training Compute-optimal Large Language Models (2022)Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.31.52
- SPHINX: The Joint Mixing Of Weights, Tasks, And Visual Embeddings For Multi-modal Large Language Models (2023)Ziyi Lin, Chris Liu, Renrui Zhang, et al.31.35
- The Dawn Of Lmms: Preliminary Explorations With Gpt-4v(ision) (2023)Zhengyuan Yang, Linjie Li, Kevin Lin, et al.30.34
- Deepseek LLM: Scaling Open-source Language Models With Longtermism (2024)Deepseek-Ai, :, Xiao Bi, et al.30.12
- Medusa: Simple LLM Inference Acceleration Framework With Multiple Decoding Heads (2024)Tianle Cai, Yuhong Li, Zhengyang Geng, et al.30.03
- LLM In A Flash: Efficient Large Language Model Inference With Limited Memory (2023)Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, et al.29.72
- Deja Vu: Contextual Sparsity For Efficient Llms At Inference Time (2023)Zichang Liu, Jue Wang, Tri Dao, et al.29.70
- Emergent Abilities Of Large Language Models (2022)Jason Wei, Yi Tay, Rishi Bommasani, et al.29.54
- Qa-lora: Quantization-aware Low-rank Adaptation Of Large Language Models (2023)Yuhui Xu, Lingxi Xie, Xiaotao Gu, et al.29.52
- Scaling Laws With Vocabulary: Larger Models Deserve Larger Vocabularies (2024)Chaofan Tao, Qian Liu, Longxu Dou, et al.28.35
- Mobilellm: Optimizing Sub-billion Parameter Language Models For On-device Use Cases (2024)Zechun Liu, Changsheng Zhao, Forrest Iandola, et al.28.23
- Lm-infinite: Zero-shot Extreme Length Generalization For Large Language Models (2023)Chi Han, Qifan Wang, Hao Peng, et al.27.86
- Buffer Of Thoughts: Thought-augmented Reasoning With Large Language Models (2024)Ling Yang, Zhaochen Yu, Tianjun Zhang, et al.27.84
- Eureka: Human-level Reward Design Via Coding Large Language Models (2023)Yecheng Jason Ma, William Liang, Guanzhi Wang, et al.27.82
- Graph Of Thoughts: Solving Elaborate Problems With Large Language Models (2023)MacIej Besta, Nils Blach, Ales Kubicek, et al.27.79
- Shortgpt: Layers In Large Language Models Are More Redundant Than You Expect (2024)Xin Men, Mingyu Xu, Qingyu Zhang, et al.27.72
- Contrastive Preference Optimization: Pushing The Boundaries Of LLM Performance In Machine Translation (2024)Haoran Xu, Amr Sharaf, Yunmo Chen, et al.27.70
- Multihop-rag: Benchmarking Retrieval-augmented Generation For Multi-hop Queries (2024)Yixuan Tang, Yi Yang27.60
- Aya Model: An Instruction Finetuned Open-access Multilingual Language Model (2024)Ahmet ΓstΓΌn, Viraat Aryabumi, Zheng-Xin Yong, et al.27.57
- Duoattention: Efficient Long-context LLM Inference With Retrieval And Streaming Heads (2024)Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, et al.27.44
- Magicoder: Empowering Code Generation With Oss-instruct (2023)Yuxiang Wei, Zhe Wang, Jiawei Liu, et al.27.43
- Efficient Large Language Models: A Survey (2023)Zhongwei Wan, Xin Wang, Che Liu, et al.27.30
- Leave No Context Behind: Efficient Infinite Context Transformers With Infini-attention (2024)Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal27.29
- Agentgym: Evolving Large Language Model-based Agents Across Diverse Environments (2024)Zhiheng Xi, Yiwen Ding, Wenxiang Chen, et al.27.20
- Internlm2 Technical Report (2024)Zheng Cai, Maosong Cao, Haojiong Chen, et al.27.03
- Xlstm: Extended Long Short-term Memory (2024)Maximilian Beck, Korbinian PΓΆppel, Markus Spanring, et al.26.88
- Large Language Models On Graphs: A Comprehensive Survey (2023)Bowen Jin, Gang Liu, Chi Han, et al.26.75
- A Survey On Large Language Models For Code Generation (2024)Juyong Jiang, Fan Wang, Jiasi Shen, et al.26.64
- SOLAR 10.7B: Scaling Large Language Models With Simple Yet Effective Depth Up-scaling (2023)Dahyun Kim, Chanjun Park, Sanghoon Kim, et al.26.34
- Triforce: Lossless Acceleration Of Long Sequence Generation With Hierarchical Speculative Decoding (2024)Hanshi Sun, Zhuoming Chen, Xinyu Yang, et al.26.11
- Saullm-7b: A Pioneering Large Language Model For Law (2024)Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, et al.26.00
- Minference 1.0: Accelerating Pre-filling For Long-context Llms Via Dynamic Sparse Attention (2024)Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, et al.25.85
- SVD-LLM: Truncation-aware Singular Value Decomposition For Large Language Model Compression (2024)Xin Wang, Yu Zheng, Zhongwei Wan, et al.25.83
- Memgpt: Towards Llms As Operating Systems (2023)Charles Packer, Sarah Wooders, Kevin Lin, et al.25.76
- Infllm: Training-free Long-context Extrapolation For Llms With An Efficient Context Memory (2024)Chaojun Xiao, Pengle Zhang, Xu Han, et al.25.56
- Eagle: Exploring The Design Space For Multimodal Llms With Mixture Of Encoders (2024)Min Shi, Fuxiao Liu, Shihao Wang, et al.25.55
- Mindmap: Knowledge Graph Prompting Sparks Graph Of Thoughts In Large Language Models (2023)Yilin Wen, Zifeng Wang, Jimeng Sun25.51
- Cacheblend: Fast Large Language Model Serving For RAG With Cached Knowledge Fusion (2024)Jiayi Yao, Hanchen Li, Yuhan Liu, et al.25.46
- Llms Know More Than They Show: On The Intrinsic Representation Of LLM Hallucinations (2024)Hadas Orgad, Michael Toker, Zorik Gekhman, et al.25.38
- Llamax: Scaling Linguistic Horizons Of LLM By Enhancing Translation Capabilities Beyond 100 Languages (2024)Yinquan Lu, Wenhao Zhu, Lei Li, et al.25.17
- Controlllm: Augment Language Models With Tools By Searching On Graphs (2023)Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, et al.24.66
- Layoutllm: Layout Instruction Tuning With Large Language Models For Document Understanding (2024)Chuwei Luo, Yufan Shen, Zhaoqing Zhu, et al.24.66
- MA-LMM: Memory-augmented Large Multimodal Model For Long-term Video Understanding (2024)Bo He, Hengduo Li, Young Kyun Jang, et al.24.50
- Shortened Llama: Depth Pruning For Large Language Models With Comparison Of Retraining Methods (2024)Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, et al.24.29
- Advancing Transformer Architecture In Long-context Large Language Models: A Comprehensive Survey (2023)Yunpeng Huang, Jingwei Xu, Junyu Lai, et al.24.27
- Compact Language Models Via Pruning And Knowledge Distillation (2024)Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, et al.24.15
- LLM360: Towards Fully Transparent Open-source Llms (2023)Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, et al.24.01
- Patchscopes: A Unifying Framework For Inspecting Hidden Representations Of Language Models (2024)Asma Ghandeharioun, Avi Caciularu, Adam Pearce, et al.23.87
- Lm-cocktail: Resilient Tuning Of Language Models Via Model Merging (2023)Shitao Xiao, Zheng Liu, Peitian Zhang, et al.23.85
- Llama-moe: Building Mixture-of-experts From Llama With Continual Pre-training (2024)Tong Zhu, Xiaoye Qu, Daize Dong, et al.23.81
- Routerbench: A Benchmark For Multi-llm Routing System (2024)Qitian Jason Hu, Jacob Bieker, Xiuyu Li, et al.23.73
- Discovering The Gems In Early Layers: Accelerating Long-context Llms With 1000x Input Token Reduction (2024)Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, et al.23.61
- Optimize Weight Rounding Via Signed Gradient Descent For The Quantization Of Llms (2023)Wenhua Cheng, Weiwei Zhang, Haihao Shen, et al.23.56
- Same Task, More Tokens: The Impact Of Input Length On The Reasoning Performance Of Large Language Models (2024)Mosh Levy, Alon Jacoby, Yoav Goldberg23.56