Analysing Discrete Self Supervised Speech Representation For Spoken Language Modeling
2023 Β· Amitay Sicherman, Yossi Adi
Abstract
This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering zero-resource speech metrics such as ABX. Code and analysis tools are available under the following link: htt
Authors
(none)
Tags
Stats
Related papers
- Exploring Speech Recognition, Translation, And Understanding With Discrete Speech Units: A Comparative Study (2023)0.00
- Discreteslu: A Large Language Model With Self-supervised Discrete Speech Units For Spoken Language Understanding (2024)5.84
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- Layer-wise Analysis Of A Self-supervised Speech Representation Model (2021)17.07
- Similarity Analysis Of Self-supervised Speech Representations (2020)10.07
- Augmentation Invariant Discrete Representation For Generative Spoken Language Modeling (2022)5.84
- Speech Resynthesis From Discrete Disentangled Self-supervised Representations (2021)16.25
- Comparative Layer-wise Analysis Of Self-supervised Speech Models (2022)0.00