Boosting Unknown-number Speaker Separation With Transformer Decoder-based Attractor
2024 Β· Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, et al.
Abstract
We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor (TDA) calculation module that can deal with an unknown number of speakers, and 3) triple-path processing blocks that can model inter-speaker relations. Given a fixed, small set of learned speaker queries and the mixture embedding produced by the dual-path blocks, TDA infers the relations of these queries and generates an attractor vector for each speaker. The estimated attractors are then combined with the mixture embedding by feature-wise linear modulation conditioning, creating a speaker dimension. The mixture embedding, conditioned with speaker information produced by TDA, is fed to the final triple-path blocks, which augment the dual-path blocks with an additional pathway dedicated to inter-speaker processing. The proposed approach outp
Authors
(none)
Tags
Stats
Related papers
- Speaker-independent Speech Separation With Deep Attractor Network (2017)16.84
- Recursive Speech Separation For Unknown Number Of Speakers (2019)12.93
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Monaural Multi-speaker Speech Separation Using Efficient Transformer Model (2023)0.00
- Cracking The Cocktail Party Problem By Multi-beam Deep Attractor Network (2018)9.92
- Individualized Conditioning And Negative Distances For Speaker Separation (2022)2.26
- Coarse-to-fine Recursive Speech Separation For Unknown Number Of Speakers (2022)0.00
- Separate And Reconstruct: Asymmetric Encoder-decoder For Speech Separation (2024)0.00