Augmentation Invariant Discrete Representation For Generative Spoken Language Modeling
2022 · Itai Gat, Felix Kreuk, Tu Anh Nguyen, et al.
Abstract
Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing internal representations of self-supervised models. Although such units show impressive modeling results, their robustness capabilities have not been extensively investigated. This work focuses on improving the robustness of discrete input representations for generative spoken language modeling. First, we formally define how to measure the robustness of such representations to various signal variations that do not alter the spoken information (e.g., time-stretch). Next, we empirically demonstrate how current state-of-the-art representation models lack robustness to such variations. To overcome this, we propose an effective and efficient method to learn robust discrete speech representation for generative spoken language modeling. The proposed appr
Authors
(none)
Tags
Stats
Related papers
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- Analysing Discrete Self Supervised Speech Representation For Spoken Language Modeling (2023)12.86
- Audio Language Modeling Using Perceptually-guided Discrete Representations (2022)0.00
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29
- Adversarial Data Augmentation Using VAE-GAN For Disordered Speech Recognition (2022)0.00
- Enhancing The Stability Of Llm-based Speech Generation Systems Through Self-supervised Representations (2024)0.00
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49
- Augmentation Adversarial Training For Self-supervised Speaker Recognition (2020)0.00