Self-supervised Neural Factor Analysis For Disentangling Utterance-level Speech Representations
2023 Β· Weiwei Lin, Chenhang He, Man-Wai Mak, et al.
Abstract
Self-supervised learning (SSL) speech models such as wav2vec and HuBERT have demonstrated state-of-the-art performance on automatic speech recognition (ASR) and proved to be extremely useful in low label-resource settings. However, the success of SSL models has yet to transfer to utterance-level tasks such as speaker, emotion, and language recognition, which still require supervised fine-tuning of the SSL models to obtain good performance. We argue that the problem is caused by the lack of disentangled representations and an utterance-level learning objective for these tasks. Inspired by how HuBERT uses clustering to discover hidden acoustic units, we formulate a factor analysis (FA) model that uses the discovered hidden acoustic units to align the SSL features. The underlying utterance-level representations are disentangled from the content of speech using probabilistic inference on the aligned features. Furthermore, the variational lower bound derived from the FA model provides an ut
Authors
(none)
Tags
Stats
Related papers
- Contentvec: An Improved Self-supervised Speech Representation By Disentangling Speakers (2022)0.00
- Disentangled Speech Representation Learning Based On Factorized Hierarchical Variational Autoencoder With Self-supervised Objective (2022)7.81
- Pushing The Limits Of Unsupervised Unit Discovery For SSL Speech Representation (2023)6.34
- Automatic Pronunciation Assessment Using Self-supervised Speech Representation Learning (2022)0.00
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- Mixture Factorized Auto-encoder For Unsupervised Hierarchical Deep Factorization Of Speech Signal (2019)0.00
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00