PEARL: Prototype-enhanced Alignment For Label-efficient Representation Learning With Deployment-driven Insights From Digital Governance Communication Systems
2026 Β· Ruiyu Zhang, Lin Nie, Wai-Fung Lam, et al.
Abstract
In many deployed systems, new text inputs are handled by retrieving similar past cases, for example when routing and responding to citizen messages in digital governance platforms. When these systems fail, the problem is often not the language model itself, but that the nearest neighbors in the embedding space correspond to the wrong cases. Modern machine learning systems increasingly rely on fixed, high-dimensional embeddings produced by large pretrained models and sentence encoders. In real-world deployments, labels are scarce, domains shift over time, and retraining the base encoder is expensive or infeasible. As a result, downstream performance depends heavily on embedding geometry. Yet raw embeddings are often poorly aligned with the local neighborhood structure required by nearest-neighbor retrieval, similarity search, and lightweight classifiers that operate directly on embeddings. We propose PEARL (Prototype-Enhanced Aligned Representation Learning), a label-efficient approach
Authors
(none)
Tags
Stats
Related papers
- Align Then Train: Efficient Retrieval Adapter Learning (2026)0.00
- Modest-align: Data-efficient Alignment For Vision-language Models (2025)0.00
- HEAL: Hierarchical Embedding Alignment Loss For Improved Retrieval And Representation Learning (2024)2.26
- Pailitao-vl: Unified Embedding And Reranker For Real-time Multi-modal Industrial Search (2026)0.00
- LEAF: Knowledge Distillation Of Text Embedding Models With Teacher-aligned Representations (2025)0.00
- Lexsembridge: Fine-grained Dense Representation Enhancement Through Token-aware Embedding Augmentation (2025)2.35
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- Large Reasoning Embedding Models: Towards Next-generation Dense Retrieval Paradigm (2025)0.00