Allost: Low-resource Speech Translation Without Source Transcription
2021 Β· Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang
Abstract
The end-to-end architecture has made promising progress in speech translation (ST). However, the ST task is still challenging under low-resource conditions. Most ST models have shown unsatisfactory results, especially in the absence of word information from the source speech utterance. In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer. The framework is based on an attention-based sequence-to-sequence model, where the encoder generates the phonetic embeddings and phone-aware acoustic representations, and the decoder controls the fusion of the two embedding streams to produce the target token sequence. In addition to investigating different fusion strategies, we explore the specific usage of byte pair encoding (BPE), which compresses a phone sequence into a syllable-like segmented sequence. Due to the conversion of symbols, a segmented sequence rep
Authors
(none)
Tags
Stats
Related papers
- Multilingual End-to-end Speech Translation (2019)0.00
- Towards Unsupervised Speech-to-text Translation (2018)0.00
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding (2019)10.48
- Improving Cross-lingual Transfer Learning For End-to-end Speech Recognition With Speech Translation (2020)9.92
- Exploring Phoneme-level Speech Representations For End-to-end Speech Translation (2019)7.81
- Long-form End-to-end Speech Translation Via Latent Alignment Segmentation (2023)0.00
- Multilingual Byte2speech Models For Scalable Low-resource Speech Synthesis (2021)0.00