Bridging The Gaps Of Both Modality And Language: Synchronous Bilingual CTC For Speech Translation And Speech Recognition
2023 Β· Chen Xu, Xiaoqian Liu, Erfeng He, et al.
Abstract
In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Building upon the recent advances in CTC application, we develop an enhanced variant, BiL-CTC+, that establishes new state-of-the-art performances on the MuST-C ST benchmarks under resource-constrained scenarios. Intriguingly, our method also yields significant improvements in speech recognition performance, revealing the effect of cross-lingual learning on transcription and demonstrating its broad applicability. The source code is available at https://github.com/xuchennlp/S2T.
Authors
(none)
Tags
Stats
Code
Related papers
- CTC-GMM: CTC Guided Modality Matching For Fast And Accurate Streaming Speech Translation (2024)3.58
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00
- Disentangling Speakers In Multi-talker Speech Recognition With Speaker-aware CTC (2024)4.98
- Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding (2019)10.48
- S2st-omni: Hierarchical Language-aware Speechllm Adaptation For Multilingual Speech-to-speech Translation (2025)0.00
- Joint Pre-training With Speech And Bilingual Text For Direct Speech To Speech Translation (2022)7.81
- Joint Training And Decoding For Multilingual End-to-end Simultaneous Speech Translation (2025)0.95
- Multilingual Training And Cross-lingual Adaptation On Ctc-based Acoustic Model (2017)0.00