Designclip: Multimodal Learning With CLIP For Design Patent Understanding
2025 Β· Zhu Wang, Homaira Huda Shomee, Sathya N. Ravi, et al.
Abstract
In the field of design patent analysis, traditional tasks such as patent classification and patent image retrieval heavily depend on the image data. However, patent images -- typically consisting of sketches with abstract and structural elements of an invention -- often fall short in conveying comprehensive visual context and semantic information. This inadequacy can lead to ambiguities in evaluation during prior art searches. Recent advancements in vision-language models, such as CLIP, offer promising opportunities for more reliable and accurate AI-driven patent analysis. In this work, we leverage CLIP models to develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents. To address the unique characteristics of patent data, DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated detailed captions for patent images and multi-views image learning. We validate the effectiveness of Design
Authors
(none)
Tags
Stats
Related papers
- A Convolutional Neural Network-based Patent Image Retrieval Method For Design Ideation (2020)3.58
- Clip-moe: Towards Building Mixture Of Experts For CLIP With Diversified Multiplet Upcycling (2024)2.26
- LLM2CLIP: Powerful Language Model Unlocks Richer Cross-modality Representation (2024)2.26
- Hierarchical Multi-positive Contrastive Learning For Patent Image Retrieval (2025)0.00
- C-CLIP: Contrastive Image-text Encoders To Close The Descriptive-commentative Gap (2023)0.00
- Pinclip: Large-scale Foundational Multimodal Representation At Pinterest (2026)0.00
- Enhancing Image Retrieval : A Comprehensive Study On Photo Search Using The CLIP Mode (2024)0.00
- Distill CLIP (DCLIP): Enhancing Image-text Retrieval Via Cross-modal Transformer Distillation (2025)0.00