On Investigation Of Unsupervised Speech Factorization Based On Normalization Flow
2019 · Haoran Sun, Yunqi Cai, Lantian Li, et al.
Abstract
Speech signals are complex composites of various information, including phonetic content, speaker traits, channel effect, etc. Decomposing this complicated mixture into independent factors, i.e., speech factorization, is fundamentally important and plays the central role in many important algorithms of modern speech processing tasks. In this paper, we present a preliminary investigation on unsupervised speech factorization based on the normalization flow model. This model constructs a complex invertible transform, by which we can project speech segments into a latent code space where the distribution is a simple diagonal Gaussian. Our preliminary investigation on the TIMIT database shows that this code space exhibits favorable properties such as denseness and pseudo linearity, and perceptually important factors such as phonetic content and speaker trait can be represented as particular directions within the code space.
Authors
(none)
Tags
Stats
Related papers
- Deep Generative Factorization For Speech Signal (2020)0.00
- Deep Factorization For Speech Signal (2018)8.82
- Mixture Factorized Auto-encoder For Unsupervised Hierarchical Deep Factorization Of Speech Signal (2019)0.00
- Self-supervised Neural Factor Analysis For Disentangling Utterance-level Speech Representations (2023)0.00
- Self-supervised Predictive Coding Models Encode Speaker And Phonetic Information In Orthogonal Subspaces (2023)7.16
- Diflow-tts: Compact And Low-latency Zero-shot Text-to-speech With Factorized Discrete Flow Matching (2025)0.00
- Improving Multi-speaker TTS Prosody Variance With A Residual Encoder And Normalizing Flows (2021)0.00
- Generative Modeling For Low Dimensional Speech Attributes With Neural Spline Flows (2022)0.00