Directional ASR: A New Paradigm For E2E Multi-speaker Speech Recognition With Source Localization
2020 Β· Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, et al.
Abstract
This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR), which explicitly models source speaker locations. In D-ASR, the azimuth angle of the sources with respect to the microphone array is defined as a latent variable. This angle controls the quality of separation, which in turn determines the ASR performance. All three functionalities of D-ASR: localization, separation, and recognition are connected as a single differentiable neural network and trained solely based on ASR error minimization objectives. The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined. In addition, D-ASR does not require explicit direction of arrival (DOA) supervision like existing data-driven localization models, which makes it
Authors
(none)
Tags
Stats
Related papers
- Survey Of End-to-end Multi-speaker Automatic Speech Recognition For Monaural Audio (2025)2.26
- One Model To Rule Them All ? Towards End-to-end Joint Speaker Diarization And Speech Recognition (2023)9.59
- Transcribe-to-diarize: Neural Speaker Diarization For Unlimited Number Of Speakers Using End-to-end Speaker-attributed ASR (2021)11.49
- Neural Blind Source Separation And Diarization For Distant Speech Recognition (2024)0.00
- Deep Learning Based Multi-source Localization With Source Splitting And Its Effectiveness In Multi-talker Speech Recognition (2021)14.23
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34
- 3D Neural Beamforming For Multi-channel Speech Separation Against Location Uncertainty (2023)0.00