Interpretable Representation Learning For Speech And Audio Signals Based On Relevance Weighting
2020 Β· Purvi Agrawal, Sriram Ganapathy
Abstract
The learning of interpretable representations from raw data presents significant challenges for time series data like speech. In this work, we propose a relevance weighting scheme that allows the interpretation of the speech representations during the forward propagation of the model itself. The relevance weighting is achieved using a sub-network approach that performs the task of feature selection. A relevance sub-network, applied on the output of first layer of a convolutional neural network model operating on raw speech signals, acts as an acoustic filterbank (FB) layer with relevance weighting. A similar relevance sub-network applied on the second convolutional layer performs modulation filterbank learning with relevance weighting. The full acoustic model consisting of relevance sub-networks, convolutional layers and feed-forward layers is trained for a speech recognition task on noisy and reverberant speech in the Aurora-4, CHiME-3 and VOiCES datasets. The proposed representation
Authors
(none)
Tags
Stats
Related papers
- Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations (2020)3.58
- Representation Learning For Speech Recognition Using Feedback Based Relevance Weighting (2021)0.00
- Segment Relevance Estimation For Audio Analysis And Weakly-labelled Classification (2019)0.00
- Towards Relevance And Sequence Modeling In Language Recognition (2020)9.23
- Deep Representation Learning In Speech Processing: Challenges, Recent Advances, And Future Trends (2020)0.00
- An Unsupervised Autoregressive Model For Speech Representation Learning (2019)17.26
- Integrating Plug-and-play Data Priors With Weighted Prediction Error For Speech Dereverberation (2023)0.00
- Interpreting End-to-end Deep Learning Models For Speech Source Localization Using Layer-wise Relevance Propagation (2024)2.26