Towards Language Modelling In The Speech Domain Using Sub-word Linguistic Units
2021 Β· Anurag Katakkar, Alan W Black
Abstract
Language models (LMs) for text data have been studied extensively for their usefulness in language generation and other downstream tasks. However, language modelling purely in the speech domain is still a relatively unexplored topic, with traditional speech LMs often depending on auxiliary text LMs for learning distributional aspects of the language. For the English language, these LMs treat words as atomic units, which presents inherent challenges to language modelling in the speech domain. In this paper, we propose a novel LSTM-based generative speech LM that is inspired by the CBOW model and built on linguistic units including syllables and phonemes. This offers better acoustic consistency across utterances in the dataset, as opposed to single melspectrogram frames, or whole words. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text
Authors
(none)
Tags
Stats
Related papers
- Generative Spoken Language Model Based On Continuous Word-sized Audio Tokens (2023)3.58
- Discreteslu: A Large Language Model With Self-supervised Discrete Speech Units For Spoken Language Understanding (2024)5.84
- Audiolm: A Language Modeling Approach To Audio Generation (2022)18.91
- Exploring Fine-tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data (2025)0.00
- Exploiting Cross-lingual Speaker And Phonetic Diversity For Unsupervised Subword Modeling (2019)6.77
- Enhancing Code-switched Text-to-speech Synthesis Capability In Large Language Models With Only Monolingual Corpora (2024)0.00
- Recent Advances In Speech Language Models: A Survey (2024)14.64
- LM-VC: Zero-shot Voice Conversion Via Speech Generation Based On Language Models (2023)0.00