Patent Representation Learning Via Self-supervision
2025 Β· You Zuo, Kim Gerdes, Eric Villemonte de La Clergerie, et al.
Abstract
This paper presents a simple yet effective contrastive learning framework for learning patent embeddings by leveraging multiple views from within the same document. We first identify a patent-specific failure mode of SimCSE style dropout augmentation: it produces overly uniform embeddings that lose semantic cohesion. To remedy this, we propose section-based augmentation, where different sections of a patent (e.g., abstract, claims, background) serve as complementary views. This design introduces natural semantic and structural diversity, mitigating over-dispersion and yielding embeddings that better preserve both global structure and local continuity. On large-scale benchmarks, our fully self-supervised method matches or surpasses citation-and IPC-supervised baselines in prior-art retrieval and classification, while avoiding reliance on brittle or incomplete annotations. Our analysis further shows that different sections specialize for different tasks-claims and summaries benefit retri
Authors
(none)
Tags
Stats
Related papers
- Hierarchical Multi-positive Contrastive Learning For Patent Image Retrieval (2025)0.00
- Revisiting Contrastive Methods For Unsupervised Learning Of Visual Representations (2021)3.91
- Paecter: Patent-level Representation Learning Using Citation-informed Transformers (2024)18.25
- Learning Efficient Representations For Image-based Patent Retrieval (2023)2.26
- Large Language Model Informed Patent Image Retrieval (2024)0.00
- Robust Cross-modal Representation Learning With Progressive Self-distillation (2022)12.33
- Designclip: Multimodal Learning With CLIP For Design Patent Understanding (2025)0.00
- A Convolutional Neural Network-based Patent Image Retrieval Method For Design Ideation (2020)3.58