Abstract

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

Authors

(none)

Tags

  • Speech Translation
  • Speech Enhancement
  • Speech Recognition

Stats

  • citations5
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score5.84
  • arxiv keyfukuda2022speech

Related papers