Lifelong Learning For Text Retrieval And Recognition In Historical Handwritten Document Collections
2019 Β· Lambert Schomaker
Abstract
This chapter provides an overview of the problems that need to be dealt with when constructing a lifelong-learning retrieval, recognition and indexing engine for large historical document collections in multiple scripts and languages, the Monk system. This application is highly variable over time, since the continuous labeling by end users changes the concept of what a 'ground truth' constitutes. Although current advances in deep learning provide a huge potential in this application domain, the scale of the problem, i.e., more than 520 hugely diverse books, documents and manuscripts precludes the current meticulous and painstaking human effort which is required in designing and developing successful deep-learning systems. The ball-park principle is introduced, which describes the evolution from the sparsely-labeled stage that can only be addressed by traditional methods or nearest-neighbor methods on embedded vectors of pre-trained neural networks, up to the other end of the spectrum w
Authors
(none)
Tags
Stats
Related papers
- L^2R: Lifelong Learning For First-stage Retrieval With Backward-compatible Representations (2023)5.24
- ICDAR 2019 Competition On Image Retrieval For Historical Handwritten Documents (2019)11.29
- A Comprehensive Study Of Imagenet Pre-training For Historical Document Image Analysis (2019)13.28
- Advancing Continual Lifelong Learning In Neural Information Retrieval: Definition, Dataset, Framework, And Empirical Evaluation (2023)6.77
- Pre-training Tasks For Embedding-based Large-scale Retrieval (2020)0.00
- Tevatron 2.0: Unified Document Retrieval Toolkit Across Scale, Language, And Modality (2025)3.58
- Deep Learning Approaches For Image Retrieval And Pattern Spotting In Ancient Documents (2019)0.00
- Fetch-a-set: A Large-scale Ocr-free Benchmark For Historical Document Retrieval (2024)0.00