Awesome Prompting
Prompting is one of the most active areas in Awesome LLM Papers β 3,281 papers in this collection, evaluated on datasets like GSM8K, MMLU, AIME 2024. A strong starting point is "Judging Llm-as-a-judge With Mt-bench And Chatbot Arena".
Datasets & benchmarks
Key papers
- Judging Llm-as-a-judge With Mt-bench And Chatbot Arena (2023)Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, et al.46.70
- React: Synergizing Reasoning And Acting In Language Models (2022)Shunyu Yao, Jeffrey Zhao, Dian Yu, et al.36.63
- Prefix-tuning: Optimizing Continuous Prompts For Generation (2021)Xiang Lisa Li, Percy Liang32.38
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025)DeepSeek-AI et al.31.23
- Executable Code Actions Elicit Better LLM Agents (2024)Xingyao Wang, Yangyi Chen, Lifan Yuan, et al.31.13
- Scaling Synthetic Data Creation With 1,000,000,000 Personas (2024)Tao Ge, Xin Chan, Xiaoyang Wang, et al.29.19
- Autogen: Enabling Next-gen LLM Applications Via Multi-agent Conversation (2023)Qingyun Wu, Gagan Bansal, Jieyu Zhang, et al.29.16
- Magpie: Alignment Data Synthesis From Scratch By Prompting Aligned Llms With Nothing (2024)Zhangchen Xu, Fengqing Jiang, Luyao Niu, et al.28.04
- Agentless: Demystifying Llm-based Software Engineering Agents (2024)Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, et al.27.93
- Buffer Of Thoughts: Thought-augmented Reasoning With Large Language Models (2024)Ling Yang, Zhaochen Yu, Tianjun Zhang, et al.27.84
- Eureka: Human-level Reward Design Via Coding Large Language Models (2023)Yecheng Jason Ma, William Liang, Guanzhi Wang, et al.27.82
- Graph Of Thoughts: Solving Elaborate Problems With Large Language Models (2023)MacIej Besta, Nils Blach, Ales Kubicek, et al.27.79
- EASYTOOL: Enhancing Llm-based Agents With Concise Tool Instruction (2024)Siyu Yuan, Kaitao Song, Jiangjie Chen, et al.27.38
- Llama Guard: Llm-based Input-output Safeguard For Human-ai Conversations (2023)Hakan Inan, Kartikeya Upasani, Jianfeng Chi, et al.27.06
- Replacing Judges With Juries: Evaluating LLM Generations With A Panel Of Diverse Models (2024)Pat Verga, Sebastian Hofstatter, Sophia Althammer, et al.26.69
- A Survey On Large Language Models For Code Generation (2024)Juyong Jiang, Fan Wang, Jiasi Shen, et al.26.64
- Promptbench: A Unified Library For Evaluation Of Large Language Models (2023)Kaijie Zhu, Qinlin Zhao, Hao Chen, et al.26.53
- Evoprompt: Connecting Llms With Evolutionary Algorithms Yields Powerful Prompt Optimizers (2023)Qingyan Guo, Rui Wang, Junliang Guo, et al.25.70
- Mindmap: Knowledge Graph Prompting Sparks Graph Of Thoughts In Large Language Models (2023)Yilin Wen, Zifeng Wang, Jimeng Sun25.51
- Toward Self-improvement Of Llms Via Imagination, Searching, And Criticizing (2024)Ye Tian, Baolin Peng, Linfeng Song, et al.24.80
- Controlllm: Augment Language Models With Tools By Searching On Graphs (2023)Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, et al.24.66
- Injecagent: Benchmarking Indirect Prompt Injections In Tool-integrated Large Language Model Agents (2024)Qiusi Zhan, Zhixiang Liang, Zifan Ying, et al.24.31
- Rephrase And Respond: Let Large Language Models Ask Better Questions For Themselves (2023)Yihe Deng, Weitong Zhang, Zixiang Chen, et al.24.30
- A Systematic Survey Of Prompt Engineering In Large Language Models: Techniques And Applications (2024)Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, et al.24.27
- Tell Your Model Where To Attend: Post-hoc Attention Steering For Llms (2023)Qingru Zhang, Chandan Singh, Liyuan Liu, et al.24.26
- A Wolf In Sheep's Clothing: Generalized Nested Jailbreak Prompts Can Fool Large Language Models Easily (2023)Peng Ding, Jun Kuang, Dan Ma, et al.24.05
- Black-box Prompt Optimization: Aligning Large Language Models Without Model Training (2023)Jiale Cheng, Xiao Liu, Kehan Zheng, et al.23.78
- Disc-lawllm: Fine-tuning Large Language Models For Intelligent Legal Services (2023)Shengbin Yue, Wei Chen, Siyuan Wang, et al.23.76
- Synthetic Data (almost) From Scratch: Generalized Instruction Tuning For Language Models (2024)Haoran Li, Qingxiu Dong, Zhengyang Tang, et al.23.56
- Large Language Model Unlearning Via Embedding-corrupted Prompts (2024)Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, et al.23.55
- Tool Learning With Large Language Models: A Survey (2024)Changle Qu, Sunhao Dai, Xiaochi Wei, et al.23.52
- Faithful Logical Reasoning Via Symbolic Chain-of-thought (2024)Jundong Xu, Hao Fei, Liangming Pan, et al.23.51
- Jailbreaking Black Box Large Language Models In Twenty Queries (2023)Patrick Chao, Alexander Robey, Edgar Dobriban, et al.23.49
- Two Tales Of Persona In Llms: A Survey Of Role-playing And Personalization (2024)Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, et al.23.45
- "do Anything Now": Characterizing And Evaluating In-the-wild Jailbreak Prompts On Large Language Models (2023)Xinyue Shen, Zeyuan Chen, Michael Backes, et al.23.21
- Longllmlingua: Accelerating And Enhancing Llms In Long Context Scenarios Via Prompt Compression (2023)Huiqiang Jiang, Qianhui Wu, Xufang Luo, et al.23.08
- Autodan: Generating Stealthy Jailbreak Prompts On Aligned Large Language Models (2023)Xiaogeng Liu, Nan Xu, Muhao Chen, et al.22.91
- Cumulative Reasoning With Large Language Models (2023)Yifan Zhang, Jingqin Yang, Yang Yuan, et al.22.83
- Cold-attack: Jailbreaking Llms With Stealthiness And Controllability (2024)Xingang Guo, Fangxu Yu, Huan Zhang, et al.22.77
- Quantifying Language Models' Sensitivity To Spurious Features In Prompt Design Or: How I Learned To Start Worrying About Prompt Formatting (2023)Melanie Sclar, Yejin Choi, Yulia Tsvetkov, et al.22.72
- Metatool Benchmark For Large Language Models: Deciding Whether To Use Tools And Which To Use (2023)Yue Huang, Jiawen Shi, Yuan Li, et al.22.68
- Prompt Engineering A Prompt Engineer (2023)Qinyuan Ye, Maxamed Axmed, Reid Pryzant, et al.22.48
- Fine-tuning Multimodal Llms To Follow Zero-shot Demonstrative Instructions (2023)Juncheng Li, Kaihang Pan, Zhiqi Ge, et al.22.33
- Certifying LLM Safety Against Adversarial Prompting (2023)Aounon Kumar, Chirag Agarwal, Suraj Srinivas, et al.22.14
- In-context Pretraining: Language Modeling Beyond Document Boundaries (2023)Weijia Shi, Sewon Min, Maria Lomeli, et al.21.97
- R-tuning: Instructing Large Language Models To Say `I Don't Know' (2023)Hanning Zhang, Shizhe Diao, Yong Lin, et al.21.96
- Deepspeed-fastgen: High-throughput Text Generation For Llms Via MII And Deepspeed-inference (2024)Connor Holmes, Masahiro Tanaka, Michael Wyatt, et al.21.74
- Making Large Language Models Perform Better In Knowledge Graph Completion (2023)Yichi Zhang, Zhuo Chen, Lingbing Guo, et al.21.74
- T-eval: Evaluating The Tool Utilization Capability Of Large Language Models Step By Step (2023)Zehui Chen, Weihua Du, Wenwei Zhang, et al.21.69
- List Items One By One: A New Data Source And Learning Paradigm For Multimodal Llms (2024)An Yan, Zhengyuan Yang, Junda Wu, et al.21.63
- Language Agents As Optimizable Graphs (2024)Mingchen Zhuge, Wenyi Wang, Louis Kirsch, et al.21.44
- INTERS: Unlocking The Power Of Large Language Models In Search With Instruction Tuning (2024)Yutao Zhu, Peitian Zhang, Chenghao Zhang, et al.21.39
- Llmlingua: Compressing Prompts For Accelerated Inference Of Large Language Models (2023)Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, et al.21.22
- Alphazero-like Tree-search Can Guide Large Language Model Decoding And Training (2023)Xidong Feng, Ziyu Wan, Muning Wen, et al.21.21
- Usable XAI: 10 Strategies Towards Exploiting Explainability In The LLM Era (2024)Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, et al.21.19
- Prompt2model: Generating Deployable Models From Natural Language Instructions (2023)Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, et al.21.09
- The Effect Of Sampling Temperature On Problem Solving In Large Language Models (2024)Matthew Renze, Erhan Guven21.07
- Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Performance Of Large Language Models (2024)Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, et al.20.96
- HLLM: Enhancing Sequential Recommendations Via Hierarchical Large Language Models For Item And User Modeling (2024)Junyi Chen, Lu Chi, Bingyue Peng, et al.20.82
- Tree Of Attacks: Jailbreaking Black-box Llms Automatically (2023)Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, et al.20.80