WSJ-0-2Mix
Emerging33papers using it
2022first seen
WSJ0-2Mix is a benchmark dataset used for evaluating supervised speech separation models, containing mixtures of two speech sources with added noise.
Papers using WSJ-0-2Mix (33)
- Espnet-se++: Speech Enhancement For Robust Speech Recognition, Translation, And UnderstandingWeakly-supervised Speech Pre-training: A Case Study On Target Speech RecognitionOn Time Domain Conformer Models For Monaural Speech Separation In Noisy Reverberant Acoustic EnvironmentsDual-path Mamba: Short And Long-term Bidirectional Selective Structured State Space Models For Speech SeparationEDSep: An Effective Diffusion-Based Method for Speech Source SeparationA Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy ReferencesDynamic Slimmable Networks for Efficient Speech SeparationListen to Extract: Onset-Prompted Target Speaker ExtractionAn Investigation on Speaker Augmentation for End-to-End Speaker ExtractionMulti-dimensional And Multi-scale Modeling For Speech Separation Optimized By Discriminative LearningSPGM: Prioritizing Local Features For Enhanced Speech Separation PerformanceResource-Efficient Separation TransformerDual-path Mamba: Short and Long-term Bidirectional Selective Structured
State Space Models for Speech SeparationSPMamba: State-space model is all you need in speech separationTF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural
Speaker SeparationESPnet-SE++: Speech Enhancement for Robust Speech Recognition,
Translation, and UnderstandingTF-GridNet: Integrating Full- and Sub-Band Modeling for Speech
SeparationX-SepFormer: End-to-end Speaker Extraction Network with Explicit
Optimization on Speaker ConfusionConditional Diffusion Model for Target Speaker ExtractionSQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASRAmbiSep: Ambisonic-to-Ambisonic Reverberant Speech Separation Using
Transformer NetworksUX-NET: Filter-and-Process-based Improved U-Net for Real-time
Time-domain Audio SeparationDiffusion-based Generative Speech Source SeparationMulti-Scale Feature Fusion Transformer Network for End-to-End Single
Channel Speech SeparationMulti-Dimensional and Multi-Scale Modeling for Speech Separation
Optimized by Discriminative LearningSpeech Separation based on Contrastive Learning and Deep ModularizationWeakly-Supervised Speech Pre-training: A Case Study on Target Speech
RecognitionUSEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionSpeech Separation using Neural Audio Codecs with Embedding LossImproving Target Speaker Extraction with Sparse LDA-transformed Speaker
EmbeddingsSPGM: Prioritizing Local Features for enhanced speech separation
performanceOn Time Domain Conformer Models for Monaural Speech Separation in Noisy
Reverberant Acoustic EnvironmentsX-CrossNet: A complex spectral mapping approach to target speaker
extraction with cross attention speaker embedding fusion