Libri-2Mix
Emerging40papers using it
2022first seen
Libri2Mix is a dataset used to evaluate Target Speaker Extraction (TSE) performance by providing mixed speech recordings from the LibriSpeech dataset.
Papers using Libri-2Mix (39)
- Investigating Self-supervised Learning For Speech Enhancement And SeparationAdapting Self-supervised Models To Multi-talker Speech Recognition Using Speaker EmbeddingsMc-spex: Towards Effective Speaker Extraction With Multi-scale Interfusion And Conditional Speaker ModulationWeakly-supervised Speech Pre-training: A Case Study On Target Speech RecognitionSoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative PipelineUnifying Diarization, Separation, and ASR with Multi-Speaker EncoderU-mamba-net: A Highly Efficient Mamba-based U-net Style Network For Noisy And Reverberant Speech SeparationSpeaker-aware Mixture Of Mixtures Training For Weakly Supervised Speaker ExtractionSEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with
Local and Global Contexts AggregationTowards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language ModelAlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlowTripleC Learning and Lightweight Speech Enhancement for Multi-Condition Target Speech ExtractionMeanFlow-TSE: One-Step Generative Target Speaker Extraction with Mean FlowGenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language ModelA Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy ReferencesLightweight speech enhancement guided target speech extraction in noisy multi-speaker scenariosElevating Robust Multi-talker ASR By Decoupling Speaker Separation And Speech RecognitionElevating Robust Multi-Talker ASR by Decoupling Speaker Separation and
Speech RecognitionMvnet: Memory Assistance And Vocal Reinforcement Network For Speech EnhancementSPGM: Prioritizing Local Features For Enhanced Speech Separation PerformanceSPMamba: State-space model is all you need in speech separationAdapting self-supervised models to multi-talker speech recognition using
speaker embeddingsScaling strategies for on-device low-complexity source separation with
Conv-TasnetSQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASRUnifying Speech Enhancement and Separation with Gradient Modulation for
End-to-End Noise-Robust Speech SeparationOn Data Sampling Strategies for Training Neural Network Speech
Separation ModelsWeakly-Supervised Speech Pre-training: A Case Study on Target Speech
RecognitionMC-SpEx: Towards Effective Speaker Extraction with Multi-Scale
Interfusion and Conditional Speaker ModulationMossFormer2: Combining Transformer and RNN-Free Recurrent Network for
Enhanced Time-Domain Monaural Speech SeparationMVNet: Memory Assistance and Vocal Reinforcement Network for Speech
EnhancementAudioSlots: A slot-centric generative model for audio separationTarget Speech Extraction with Conditional Diffusion ModelSPGM: Prioritizing Local Features for enhanced speech separation
performanceProbing Self-supervised Learning Models with Target Speech ExtractionNoise-robust Speech Separation with Fast Generative CorrectionOn the effectiveness of enrollment speech augmentation for Target
Speaker ExtractionWanna hear your voice? A sample is all we need!Multi-Level Speaker Representation for Target Speaker ExtractionU-Mamba-Net: A highly efficient Mamba-based U-net style network for
noisy and reverberant speech separation