Semantic Search At LinkedIn

Fedor Borisyuk·Sriram Vasudevan·Muchen Wu·Guoyao Li·Benjamin Le·Shaobo Zhang·Qianqi Kay Shen·Yuchin Juan·Kayhan Behdin·Liming Dong·Kaixu Yang·Shusen Jing·Ravi Pothamsetty·Rajat Arora·Sophie Yanying Sheng·Vitaly Abdrashitov·Yang Zhao·Lin Su·Xiaoqing Wang·Chujie Zheng·Sarang Metkar·Rupesh Gupta·Igor Lapchuk·David N. Racca·Madhumitha Mohan·Yanbo Li·Haojun Li·Saloni Gandhi·Xueying Lu·Chetan Bhole·Ali Hooshmand·Xin Yang·Raghavan Muthuregunathan·Jiajun Zhang·Mathew Teoh·Adam Coler·Abhinav Gupta·Xiaojing Ma·Sundara Raman Ramachandran·Morteza Ramezani·Yubo Wang·Lijuan Zhang·Richard Li·Jian Sheng·Chanh Nguyen·Yen-Chi Chen·Chuanrui Zhu·Claire Zhang·Jiahao Xu·Deepti Kulkarni·Qing Lan·Arvind Subramaniam·Ata Fatahibaarzi·Steven Shimizu·Yanning Chen·Zhipeng Wang·Ran He·Zhengze Zhou·Qingquan Song·Yun Dai·Caleb Johnson·Ping Liu·Shaghayegh Gharghabi·Gokulraj Mohanasundaram·Juan Bottaro·Santhosh Sachindran·Qi Guo·Yunxiang Ren·Chengming Jiang·Di Mo·Luke Simon·Jianqiang Shen·Jingwei Wu·Wenjing Zhang·2026

Google Scholar ↗Semantic Scholar ↗

Information Retrieval Artificial Intelligence Machine Learning

Abstract

Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's LLM-based semantic search framework for AI Job Search and AI People Search, combining an LLM relevance judge, embedding-based retrieval, and a compact Small Language Model trained via multi-teacher distillation to jointly optimize relevance and engagement. A prefill-oriented inference architecture co-designed with model pruning, context compression, and text-embedding hybrid interactions boosts ranking throughput by over 75x under a fixed latency constraint while preserving near-teacher-level NDCG, enabling one of the first production LLM-based ranking systems with efficiency comparable to traditional approaches and delivering significant gains in quality and user engagement.

Abstract

Related papers