Explainable By-design Audio Segmentation Through Non-negative Matrix Factorization And Probing
2024 · Martin Lebourdais, Théo Mariotte, Antonio Almudévar, et al.
Abstract
Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy "good" properties, such as informativeness, compactness, or modularity, to be interpretable. In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new perspectives toward the evaluation of interpretable representations according to "good" properties.
Authors
(none)
Tags
Stats
Related papers
- Complex NMF Under Phase Constraints Based On Signal Modeling: Application To Audio Source Separation (2016)7.50
- Supervised And Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization (2017)18.80
- End-to-end Non-negative Autoencoders For Sound Source Separation (2019)2.26
- Audiomnist: Exploring Explainable Artificial Intelligence For Audio Analysis On A Simple Benchmark (2018)13.50
- Multichannel Audio Source Separation With Independent Deeply Learned Matrix Analysis Using Product Of Source Models (2021)0.00
- Joint Sound Source Separation And Speaker Recognition (2016)4.52
- Neural Network Alternatives To Convolutive Audio Models For Source Separation (2017)0.00
- Semi-supervised Multichannel Speech Enhancement With Variational Autoencoders And Non-negative Matrix Factorization (2018)12.25