Individualized Conditioning And Negative Distances For Speaker Separation
2022 · Tao Sun, Nidal Abuhajar, Shuyu Gong, et al.
Abstract
Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models.
Authors
(none)
Tags
Stats
Related papers
- Voicefilter: Targeted Voice Separation By Speaker-conditioned Spectrogram Masking (2018)17.48
- Directed Speech Separation For Automatic Speech Recognition Of Long Form Conversational Speech (2021)2.26
- Real-time Speech Enhancement And Separation With A Unified Deep Neural Network For Single/dual Talker Scenarios (2023)2.26
- Single-channel Speech Separation With Auxiliary Speaker Embeddings (2019)0.00
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Boosting Unknown-number Speaker Separation With Transformer Decoder-based Attractor (2024)0.00
- Two-stage Model And Optimal SI-SNR For Monaural Multi-speaker Speech Separation In Noisy Environment (2020)0.00
- An Enhanced Conv-tasnet Model For Speech Separation Using A Speaker Distance-based Loss Function (2022)0.00