Interpreting End-to-end Deep Learning Models For Speech Source Localization Using Layer-wise Relevance Propagation
2024 Β· Luca Comanducci, Fabio Antonacci, Augusto Sarti
Abstract
Deep learning models are widely applied in the signal processing community, yet their inner working procedure is often treated as a black box. In this paper, we investigate the use of eXplainable Artificial Intelligence (XAI) techniques to learning-based end-to-end speech source localization models. We consider the Layer-wise Relevance Propagation (LRP) technique, which aims to determine which parts of the input are more important for the output prediction. Using LRP we analyze two state-of-the-art models, of differing architectural complexity that map audio signals acquired by the microphones to the cartesian coordinates of the source. Specifically, we inspect the relevance associated with the input features of the two models and discover that both networks denoise and de-reverberate the microphone signals to compute more accurate statistical correlations between them and consequently localize the sources. To further demonstrate this fact, we estimate the Time-Difference of Arrivals (
Authors
(none)
Tags
Stats
Related papers
- Audiomnist: Exploring Explainable Artificial Intelligence For Audio Analysis On A Simple Benchmark (2018)13.50
- Deep Learning Based Multi-source Localization With Source Splitting And Its Effectiveness In Multi-talker Speech Recognition (2021)14.23
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- Visualizing Automatic Speech Recognition -- Means For A Better Understanding? (2022)4.52
- Semi-supervised Source Localization In Reverberant Environments With Deep Generative Modeling (2021)11.39
- Multi-channel End-to-end Neural Network For Speech Enhancement, Source Localization, And Voice Activity Detection (2022)0.00
- Analyzing Hidden Representations In End-to-end Automatic Speech Recognition Systems (2017)0.00
- Interpretable Representation Learning For Speech And Audio Signals Based On Relevance Weighting (2020)9.59