Evaluating Subtitle Segmentation For End-to-end Generation Systems
2022 · Alina Karakanta, François Buet, Mauro Cettolo, et al.
Abstract
Subtitles appear on screen as short pieces of text, segmented based on formal constraints (length) and syntactic/semantic criteria. Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference. However, standard segmentation metrics cannot be applied when systems generate outputs different than the reference, e.g. with end-to-end subtitling systems. In this paper, we study ways to conduct reference-based evaluations of segmentation accuracy irrespective of the textual content. We first conduct a systematic analysis of existing metrics for evaluating subtitle segmentation. We then introduce \(Sigma\), a new Subtitle Segmentation Score derived from an approximate upper-bound of BLEU on segmentation boundaries, which allows us to disentangle the effect of good segmentation from text quality. To compare \(Sigma\) with existing metrics, we further propose a boundary projection method from imperfect hypotheses to the true reference. Results show that al
Authors
(none)
Tags
Stats
Related papers
- Suber: A Metric For Automatic Evaluation Of Subtitle Quality (2022)0.00
- Subtitles To Segmentation: Improving Low-resource Speech-to-text Translation Pipelines (2020)0.00
- V-SAT: Video Subtitle Annotation Tool (2025)0.00
- Evaluating The IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, And Segmentation (2024)0.00
- Speechbertscore: Reference-aware Automatic Evaluation Of Speech Generation Leveraging NLP Evaluation Metrics (2024)10.74
- Dodging The Data Bottleneck: Automatic Subtitling With Automatically Segmented ST Corpora (2022)2.26
- Leveraging Broadcast Media Subtitle Transcripts For Automatic Speech Recognition And Subtitling (2025)2.26
- Better Late Than Never: Meta-evaluation Of Latency Metrics For Simultaneous Speech-to-text Translation (2025)1.81