Deep Learning Based Multi-source Localization With Source Splitting And Its Effectiveness In Multi-talker Speech Recognition
2021 Β· Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, et al.
Abstract
Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all the speakers simultaneously from the audio mixture. At the heart of the proposal is a source splitting mechanism that creates source-specific intermediate representations inside the network. This allows our model to give source-specific posteriors as the output unlike the traditional multi-label classification approach. Existing deep learning methods perform a frame level prediction, whereas our approach performs an utterance level prediction by incorporating temporal selection and averaging inside the network to avoid post-processing. We also experiment with various loss functions and show that a variant of earth mover distance (EMD) is very effective in classifying DOA at a very high resolution by modeling inter-class relationships. In addition to
Authors
(none)
Tags
Stats
Related papers
- Multi-speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals (2018)18.46
- Directional ASR: A New Paradigm For E2E Multi-speaker Speech Recognition With Source Localization (2020)8.09
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Deep Learning Based Stage-wise Two-dimensional Speaker Localization With Large Ad-hoc Microphone Arrays (2022)3.58
- Deep Learning Based Audio-visual Multi-speaker DOA Estimation Using Permutation-free Loss Function (2022)4.52
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- Neural Blind Source Separation And Diarization For Distant Speech Recognition (2024)0.00
- SLOGD: Speaker Location Guided Deflation Approach To Speech Separation (2019)0.00