Between Flexibility And Consistency: Joint Generation Of Captions And Subtitles
2021 Β· Alina Karakanta, Marco Gaido, Matteo Negri, et al.
Abstract
Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i.e. captions). However, the joint generation of source captions and target subtitles does not only bring potential output quality advantages when the two decoding processes inform each other, but it is also often required in multilingual scenarios. In this work, we focus on ST models which generate consistent captions-subtitles in terms of structure and lexical content. We further introduce new metrics for evaluating subtitling consistency. Our findings show that joint decoding leads to increased performance and consistency between the generated captions and subtitles while still allowing for sufficient flexibility to produce subtitles conforming to language-specific needs and norms.
Authors
(none)
Tags
Stats
Related papers
- Joint Generation Of Captions And Subtitles With Dual Decoding (2022)6.34
- Direct Speech Translation For Automatic Subtitling (2022)6.77
- Learning To Jointly Transcribe And Subtitle For End-to-end Spontaneous Speech Recognition (2022)5.84
- Dodging The Data Bottleneck: Automatic Subtitling With Automatically Segmented ST Corpora (2022)2.26
- Leveraging Broadcast Media Subtitle Transcripts For Automatic Speech Recognition And Subtitling (2025)2.26
- Transcribing And Translating, Fast And Slow: Joint Speech Translation And Recognition (2024)0.00
- Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding (2019)10.48
- Visualization: The Missing Factor In Simultaneous Speech Translation (2021)0.00