Speaker- And Age-invariant Training For Child Acoustic Modeling Using Adversarial Multi-task Learning
2022 Β· Mostafa Shahin, Beena Ahmed, Julien Epps
Abstract
One of the major challenges in acoustic modelling of child speech is the rapid changes that occur in the children's articulators as they grow up, their differing growth rates and the subsequent high variability in the same age group. These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children. In this paper, a speaker- and age-invariant training approach based on adversarial multi-task learning is proposed. The system consists of one generator shared network that learns to generate speaker- and age-invariant features connected to three discrimination networks, for phoneme, age, and speaker. The generator network is trained to minimize the phoneme-discrimination loss and maximize the speaker- and age-discrimination losses in an adversarial multi-task learning fashion. The generator network is a Time Delay Neural Network (TDNN) architecture while the three discriminators are feed-forward
Authors
(none)
Tags
Stats
Related papers
- Leveraging Speaker Embeddings With Adversarial Multi-task Learning For Age Group Classification (2023)0.00
- Multi-task Adversarial Training Algorithm For Multi-speaker Neural Text-to-speech (2022)0.00
- Adversarial Training For Multi-domain Speaker Recognition (2020)6.77
- Adversarial Learning Of Raw Speech Features For Domain Invariant Speech Recognition (2018)9.23
- Speaker Adaptation Using Spectro-temporal Deep Features For Dysarthric And Elderly Speech Recognition (2022)12.02
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- Adversarial Training Of Denoising Diffusion Model Using Dual Discriminators For High-fidelity Multi-speaker TTS (2023)2.26
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49