Multimodal Image-text Matching Improves Retrieval-based Chest X-ray Report Generation
2023 Β· Jaehwan Jeong, Katherine Tian, Andrew Li, et al.
Abstract
Automated generation of clinically accurate radiology reports can improve patient care. Previous report generation methods that rely on image captioning models often generate incoherent and incorrect text due to their lack of relevant domain knowledge, while retrieval-based attempts frequently retrieve reports that are irrelevant to the input image. In this work, we propose Contrastive X-Ray REport Match (X-REM), a novel retrieval-based radiology report generation module that uses an image-text matching score to measure the similarity of a chest X-ray image and radiology report for report retrieval. We observe that computing the image-text matching score with a language-image model can effectively capture the fine-grained interaction between image and text that is often lost when using cosine similarity. X-REM outperforms multiple prior radiology report generation modules in terms of both natural language and clinical metrics. Human evaluation of the generated reports suggests that X-R
Authors
(none)
Tags
Stats
Related papers
- X-TRA: Improving Chest X-ray Tasks With Cross-modal Retrieval Augmentation (2023)8.09
- Learning Visual-semantic Embeddings For Reporting Abnormal Findings On Chest X-rays (2020)9.76
- DART: Disease-aware Image-text Alignment And Self-correcting Re-alignment For Trustworthy Radiology Report Generation (2025)4.52
- Grounded Multimodal Retrieval-augmented Drafting Of Radiology Impressions Using Case-based Similarity Search (2026)0.00
- Prototype-enhanced Confidence Modeling For Cross-modal Medical Image-report Retrieval (2025)0.00
- Radir: A Scalable Framework For Multi-grained Medical Image Retrieval Via Radiology Report Mining (2025)0.00
- Unsupervised Multimodal Representation Learning Across Medical Images And Reports (2018)0.00
- Benchmarking Robustness Of Contrastive Learning Models For Medical Image-report Retrieval (2025)0.00