Learning Deep Semantic Model For Code Search Using Codesearchnet Corpus
2022 Β· Chen Wu, Ming Yan
Abstract
Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics. Recently, deep neural network for code search has been a hot research topic. Typical methods for neural code search first represent the code snippet and query text as separate embeddings, and then use vector distance (e.g. dot-product or cosine) to calculate the semantic similarity between them. There exist many different ways for aggregating the variable length of code or query tokens into a learnable embedding, including bi-encoder, cross-encoder, and poly-encoder. The goal of the query encoder and code encoder is to produce embeddings that are close with each other for a related pair of query and the corresponding desired code snippet, in which the choice and design of encoder
Authors
(none)
Tags
Stats
Related papers
- On The Challenges And Opportunities Of Learned Sparse Retrieval For Code (2026)0.00
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- Embracing Structure In Data For Billion-scale Semantic Product Search (2021)0.00
- Learning To Embed Semantic Similarity For Joint Image-text Retrieval (2022)7.50
- Towards A Generalist Code Embedding Model Based On Massive Data Synthesis (2025)8.13
- Text Embeddings For Retrieval From A Large Knowledge Base (2018)4.52
- A Survey On Deep Text Hashing: Efficient Semantic Text Retrieval With Binary Representation (2025)3.83
- Learning Joint Representations Of Videos And Sentences With Web Image Search (2016)12.93