Personalized Acoustic Modeling By Weakly Supervised Multi-task Deep Learning Using Acoustic Tokens Discovered From Unlabeled Data
2017 Β· Cheng-Kuan Wei, Cheng-Tao Chung, Hung-Yi Lee, et al.
Abstract
It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We therefore propose a multi-task deep learning framework called a phoneme-token deep neural network (PTDNN), jointly trained from unsupervised acoustic tokens discovered from unlabeled data and very limited transcribed data for personalized acoustic modeling. We term this scenario "weakly supervised". The underlying intuition is that the high degree of similarity between the HMM states of acoustic token models and phoneme models may help them learn from each other in this multi-task learning framework. Initial experiments performed over a personalized audio data set recorded from Facebook post
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Iterative Deep Learning Of Speech Features And Acoustic Tokens With Applications To Spoken Term Detection (2017)0.00
- Efficient Personalized Speech Enhancement Through Self-supervised Learning (2021)10.21
- Retrieving Speaker Information From Personalized Acoustic Models For Speech Recognition (2021)5.84
- Personalized Speech Enhancement Through Self-supervised Data Augmentation And Purification (2021)9.92
- Exploiting Cross-lingual Speaker And Phonetic Diversity For Unsupervised Subword Modeling (2019)6.77
- The Universal Personalizer: Few-shot Dysarthric Speech Recognition Via Meta-learning (2025)0.00
- Deep Shallow Fusion For RNN-T Personalization (2020)12.81
- Self-supervised Learning From Contrastive Mixtures For Personalized Speech Enhancement (2020)0.00