Tranusr: Phoneme-to-word Transcoder Based Unified Speech Representation Learning For Cross-lingual Speech Recognition
2023 Β· Hongfei Xue, Qijie Shao, Peikun Chen, et al.
Abstract
UniSpeech has achieved superior performance in cross-lingual automatic speech recognition (ASR) by explicitly aligning latent representations to phoneme units using multi-task self-supervised learning. While the learned representations transfer well from high-resource to low-resource languages, predicting words directly from these phonetic representations in downstream ASR is challenging. In this paper, we propose TranUSR, a two-stage model comprising a pre-trained UniData2vec and a phoneme-to-word Transcoder. Different from UniSpeech, UniData2vec replaces the quantized discrete representations with continuous and contextual representations from a teacher model for phonetically-aware pre-training. Then, Transcoder learns to translate phonemes to words with the aid of extra texts, enabling direct word generation. Experiments on Common Voice show that UniData2vec reduces PER by 5.3% compared to UniSpeech, while Transcoder yields a 14.4% WER reduction compared to grapheme fine-tuning.
Authors
(none)
Tags
Stats
Related papers
- Unispeech: Unified Speech Representation Learning With Labeled And Unlabeled Data (2021)0.00
- Transformer-transducers For Code-switched Speech Recognition (2020)10.97
- Cross-lingual Knowledge Transfer And Iterative Pseudo-labeling For Low-resource Speech Recognition With Transducers (2023)0.00
- Speechut: Bridging Speech And Text With Hidden-unit For Encoder-decoder Based Speech-text Pre-training (2022)10.74
- Unislu: Unified Spoken Language Understanding From Heterogeneous Cross-task Datasets (2025)0.00
- LAMASSU: Streaming Language-agnostic Multilingual Speech Recognition And Translation Using Neural Transducers (2022)7.50
- Dual-decoder Transformer For Joint Automatic Speech Recognition And Multilingual Speech Translation (2020)13.73
- Tokenverse: Towards Unifying Speech And NLP Tasks Via Transducer-based ASR (2024)1.40