SLOGD: Speaker Location Guided Deflation Approach To Speech Separation
2019 Β· Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr
Abstract
Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of \(44.2\)%, a \(34\)% relative improvement over the system without separation and \(17\)% relative improvement over Conv-TasNet.
Authors
(none)
Tags
Stats
Related papers
- Analyzing The Impact Of Speaker Localization Errors On Speech Separation For Automatic Speech Recognition (2019)0.00
- Individualized Conditioning And Negative Distances For Speaker Separation (2022)2.26
- Low-latency Speech Separation Guided Diarization For Telephone Conversations (2022)6.77
- TS-SEP: Joint Diarization And Separation Conditioned On Estimated Speaker Embeddings (2023)10.35
- Low-latency Deep Clustering For Speech Separation (2019)8.09
- Speech Separation Based On Multi-stage Elaborated Dual-path Deep Bilstm With Auxiliary Identity Loss (2020)9.77
- SADDEL: Joint Speech Separation And Denoising Model Based On Multitask Learning (2020)0.00
- Simultaneous Speech Extraction For Multiple Target Speakers Under The Meeting Scenarios (2022)2.26