Fast MVAE: Joint Separation And Classification Of Mixed Sources Based On Multichannel Variational Autoencoder With Auxiliary Classifier
2018 Β· Li Li, Hirokazu Kameoka, Shoji Makino
Abstract
This paper proposes an alternative algorithm for multichannel variational autoencoder (MVAE), a recently proposed multichannel source separation approach. While MVAE is notable in its impressive source separation performance, the convergence-guaranteed optimization algorithm and that it allows us to estimate source-class labels simultaneously with source separation, there are still two major drawbacks, i.e., the high computational complexity and unsatisfactory source classification accuracy. To overcome these drawbacks, the proposed method employs an auxiliary classifier VAE, an information-theoretic extension of the conditional VAE, for learning the generative model of the source spectrograms. Furthermore, with the trained auxiliary classifier, we introduce a novel algorithm for the optimization that is able to not only reduce the computational time but also improve the source classification performance. We call the proposed method "fast MVAE (fMVAE)". Experimental evaluations reveale
Authors
(none)
Tags
Stats
Related papers
- Fastmvae2: On Improving And Accelerating The Fast Variational Autoencoder-based Source Separation Algorithm For Determined Mixtures (2021)7.81
- Generalized Multichannel Variational Autoencoder For Underdetermined Source Separation (2018)7.81
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69
- Deep Variational Generative Models For Audio-visual Speech Separation (2020)0.00
- Raw Multi-channel Audio Source Separation Using Multi-resolution Convolutional Auto-encoders (2018)11.58
- Deep Bayesian Unsupervised Source Separation Based On A Complex Gaussian Mixture Model (2019)6.34
- Robust Unsupervised Audio-visual Speech Enhancement Using A Mixture Of Variational Autoencoders (2019)9.23
- Mixture Of Inference Networks For Vae-based Audio-visual Speech Enhancement (2019)10.35