Alternate Intermediate Conditioning With Syllable-level And Character-level Targets For Japanese ASR
2022 Β· Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida
Abstract
End-to-end automatic speech recognition directly maps input speech to characters. However, the mapping can be problematic when several different pronunciations should be mapped into one character or when one pronunciation is shared among many different characters. Japanese ASR suffers the most from such many-to-one and one-to-many mapping problems due to Japanese kanji characters. To alleviate the problems, we introduce explicit interaction between characters and syllables using Self-conditioned connectionist temporal classification (CTC), in which the upper layers are ``self-conditioned'' on the intermediate predictions from the lower layers. The proposed method utilizes character-level and syllable-level intermediate predictions as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Corpus of Spontaneous Japanese show that the proposed method outperformed the conventional multi-task and Self-conditioned CTC methods.
Authors
(none)
Tags
Stats
Related papers
- Relaxing The Conditional Independence Assumption Of Ctc-based ASR By Conditioning On Intermediate Predictions (2021)13.34
- A Comparative Study On Neural Architectures And Training Methods For Japanese Speech Recognition (2021)7.50
- Hierarchical Conditional End-to-end ASR With CTC And Multi-granular Subword Units (2021)9.23
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- Improving Transducer-based Spoken Language Understanding With Self-conditioned CTC And Knowledge Transfer (2025)0.00
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- Reducing Spelling Inconsistencies In Code-switching ASR Using Contextualized CTC Loss (2020)4.52
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84