A Comparative Study On Neural Architectures And Training Methods For Japanese Speech Recognition
2021 Β· Shigeki Karita, Yotaro Kubo, Michiel Adriaan Unico Bacchiani, et al.
Abstract
End-to-end (E2E) modeling is advantageous for automatic speech recognition (ASR) especially for Japanese since word-based tokenization of Japanese is not trivial, and E2E modeling is able to model character sequences directly. This paper focuses on the latest E2E modeling techniques, and investigates their performances on character-based Japanese ASR by conducting comparative experiments. The results are analyzed and discussed in order to understand the relative advantages of long short-term memory (LSTM), and Conformer models in combination with connectionist temporal classification, transducer, and attention-based loss functions. Furthermore, the paper investigates on effectivity of the recent training techniques such as data augmentation (SpecAugment), variational noise injection, and exponential moving average. The best configuration found in the paper achieved the state-of-the-art character error rates of 4.1%, 3.2%, and 3.5% for Corpus of Spontaneous Japanese (CSJ) eval1, eval2,
Authors
(none)
Tags
Stats
Related papers
- A Comparison Of End-to-end Models For Long-form Speech Recognition (2019)12.93
- Retraining-free Customized ASR For Enharmonic Words Based On A Named-entity-aware Model And Phoneme Similarity Estimation (2023)4.52
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- On The Comparison Of Popular End-to-end Models For Large Scale Speech Recognition (2020)0.00
- 4D ASR: Joint Modeling Of CTC, Attention, Transducer, And Mask-predict Decoders (2022)7.50
- Recent Advances In End-to-end Automatic Speech Recognition (2021)18.62
- Alternate Intermediate Conditioning With Syllable-level And Character-level Targets For Japanese ASR (2022)0.00
- Benchmarking Japanese Speech Recognition On ASR-LLM Setups With Multi-pass Augmented Generative Error Correction (2024)0.00