Multi-dimensional And Multi-scale Modeling For Speech Separation Optimized By Discriminative Learning
2023 Β· Zhaoxi Mu, Xinyu Yang, Wenjing Zhu
Abstract
Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this paper, we present a novel approach named Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation. Specifically, we design a new network SE-Conformer that can model audio sequences in multiple dimensions and scales, and apply it to the dual-path speech separation framework. Furthermore, we propose Multi-Block Feature Aggregation to improve the separation effect by selectively utilizing information from the intermediate blocks of the separation network. Meanwhile, we propose a speaker similarity discriminative loss to optimize the speech separation model to address the problem of poor performance when speakers have similar voices. Experimental results on the benchmark datasets WSJ0-2mix and WHAM! show that ISCIT can achieve stat
Authors
(none)
Tags
Stats
Related papers
- Multi-scale Feature Fusion Transformer Network For End-to-end Single Channel Speech Separation (2022)0.00
- On Time Domain Conformer Models For Monaural Speech Separation In Noisy Reverberant Acoustic Environments (2023)5.84
- Monaural Multi-speaker Speech Separation Using Efficient Transformer Model (2023)0.00
- Dasformer: Deep Alternating Spectrogram Transformer For Multi/single-channel Speech Separation (2023)0.00
- Tiny-sepformer: A Tiny Time-domain Transformer Network For Speech Separation (2022)8.82
- Two-stage Model And Optimal SI-SNR For Monaural Multi-speaker Speech Separation In Noisy Environment (2020)0.00
- Attention Is All You Need In Speech Separation (2020)20.59
- Efficient Integration Of Multi-channel Information For Speaker-independent Speech Separation (2020)0.00