Single Microphone Speaker Extraction Using Unified Time-frequency Siamese-unet
2022 Β· Aviad Eisenberg, Sharon Gannot, Shlomo E. Chazan
Abstract
In this paper we present a unified time-frequency method for speaker extraction in clean and noisy conditions. Given a mixed signal, along with a reference signal, the common approaches for extracting the desired speaker are either applied in the time-domain or in the frequency-domain. In our approach, we propose a Siamese-Unet architecture that uses both representations. The Siamese encoders are applied in the frequency-domain to infer the embedding of the noisy and reference spectra, respectively. The concatenated representations are then fed into the decoder to estimate the real and imaginary components of the desired speaker, which are then inverse-transformed to the time-domain. The model is trained with the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) loss to exploit the time-domain information. The time-domain loss is also regularized with frequency-domain loss to preserve the speech patterns. Experimental results demonstrate that the unified approach is not only very eas
Authors
(none)
Tags
Stats
Related papers
- A Two-stage Speaker Extraction Algorithm Under Adverse Acoustic Conditions Using A Single-microphone (2023)0.00
- USED: Universal Speaker Extraction And Diarization (2023)7.50
- Focus On The Sound Around You: Monaural Target Speaker Extraction Via Distance And Speaker Information (2023)7.81
- USEV: Universal Speaker Extraction With Visual Cue (2021)12.17
- Multi-stage Speaker Extraction With Utterance And Frame-level Reference Signals (2020)12.54
- End-to-end Multi-microphone Speaker Extraction Using Relative Transfer Functions (2025)0.00
- USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction (2024)11.88
- Real-time Speech Enhancement And Separation With A Unified Deep Neural Network For Single/dual Talker Scenarios (2023)2.26