A Full Text-dependent End To End Mispronunciation Detection And Diagnosis With Easy Data Augmentation Techniques
2021 Β· Kaiqi Fu, Jones Lin, Dengfeng Ke, et al.
Abstract
Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is difference with sed-mdd, the model achieves a fully end-to-end system by aligning the audio with the phoneme sequences of the prior text inside the model through the attention mechanism. Moreover, the prior text as input will be a problem of imbalance between positive and negative samples in the phoneme sequence. To alleviate this problem, we propose three simple data augmentation methods, which effectively improve the ability of model to capture mispronounced phonemes. We conduct experiments on L2-ARCTIC, and our best performance improved from 49.29% to 56.08% in F-measure metric compared to
Authors
(none)
Tags
Stats
Related papers
- Improving End-to-end Modeling For Mispronunciation Detection With Effective Augmentation Mechanisms (2021)0.00
- Multi-view Multi-task Representation Learning For Mispronunciation Detection (2023)0.00
- Improving Mispronunciation Detection With Wav2vec2-based Momentum Pseudo-labeling For Accentedness And Intelligibility Assessment (2022)7.16
- Speechblender: Speech Augmentation Framework For Mispronunciation Data Generation (2022)2.26
- Coca-mdd: A Coupled Cross-attention Based Framework For Streaming Mispronunciation Detection And Diagnosis (2021)5.84
- Adaptive Frequency Cepstral Coefficients For Word Mispronunciation Detection (2016)5.84
- Multi-modal Data Augmentation For End-to-end ASR (2018)11.67
- Hmm-based Data Augmentation For E2E Systems For Building Conversational Speech Synthesis Systems (2022)0.00