Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition
2022 Β· Shujie Hu, Shansong Liu, Xurong Xie, et al.
Abstract
Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech. Their practical application to disordered speech recognition is often limited by the difficulty in collecting such specialist data from impaired speakers. This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training before being cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features. Mixture density networks based neural A2A inversion models were used. A cross-domain feature adaptation network was also used to reduce the acoustic mismatch between the TORGO and UASpeech data. On both tasks, incorporating the A2A generated articulatory features consistently outperformed the baseline hybrid DNN/TDNN, CTC and Conformer based end-to-end systems constru
Authors
(none)
Tags
Stats
Related papers
- Speaker-independent Acoustic-to-articulatory Inversion Through Multi-channel Attention Discriminator (2024)0.00
- Acoustic-to-articulatory Inversion Based On Speech Decomposition And Auxiliary Feature (2022)0.00
- Spectro-temporal Deep Features For Disordered Speech Assessment And Recognition (2022)8.60
- Audio Data Augmentation For Acoustic-to-articulatory Speech Inversion Using Bidirectional Gated Rnns (2022)0.00
- Adversarial Learning Of Raw Speech Features For Domain Invariant Speech Recognition (2018)9.23
- Independent And Automatic Evaluation Of Acoustic-to-articulatory Inversion Models (2019)0.00
- Adversarial Data Augmentation Using VAE-GAN For Disordered Speech Recognition (2022)0.00
- Toward Domain-invariant Speech Recognition Via Large Scale Training (2018)13.39