Language-agnostic Visual Embeddings For Cross-script Handwriting Retrieval
2026 Β· Fangke Chen, Tianhao Dong, Sirry Chen, et al.
Abstract
Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offer potential solutions, their prohibitive computational costs hinder practical edge deployment. To address this, we propose a lightweight asymmetric dual-encoder framework that learns unified, style-invariant visual embeddings. By jointly optimizing instance-level alignment and class-level semantic consistency, our approach anchors visual embeddings to language-agnostic semantic prototypes, enforcing invariance across scripts and writing styles. Experiments show that our method outperforms 28 baselines and achieves state-of-the-art accuracy on within-language retrieval benchmarks. We further conduct explicit cross-lingual retrieval, where the query language differs from the target language, to validate the effectiveness of the learned cross-lingual representations. Achieving strong performance with o
Authors
(none)
Tags
Stats
Related papers
- Webly Supervised Joint Embedding For Cross-modal Image-text Retrieval (2018)13.17
- Aligning Multilingual Word Embeddings For Cross-modal Retrieval Task (2019)2.26
- Adapting Dual-encoder Vision-language Models For Paraphrased Retrieval (2024)0.00
- Image Search Using Multilingual Texts: A Cross-modal Learning Approach Between Image And Text (2019)0.00
- Learning Robust Visual-semantic Embeddings (2017)15.22
- Efficient Discriminative Joint Encoders For Large Scale Vision-language Reranking (2025)0.00
- Online Writer Retrieval With Chinese Handwritten Phrases: A Synergistic Temporal-frequency Representation Learning Approach (2024)7.11
- Multimodal Representation Alignment For Cross-modal Information Retrieval (2025)0.00