SMS-WSJ: Database, Performance Measures, And Baseline Recipe For Multi-channel Source Separation And Recognition
2019 Β· Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, et al.
Abstract
We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal. It consists of artificially mixed speech taken from the WSJ database, but unlike earlier databases we consider all WSJ0+1 utterances and take care of strictly separating the speaker sets present in the training, validation and test sets. When spatializing the data we ensure a high degree of randomness w.r.t. room size, array center and rotation, as well as speaker position. Furthermore, this paper offers a critical assessment of recently proposed measures of source separation performance. Alongside the code to generate the database we provide a source separation baseline and a Kaldi recipe with competitive word error rates to provide common ground for evaluation.
Authors
(none)
Tags
Stats
Related papers
- Libriheavymix: A 20,000-hour Dataset For Single-channel Reverberant Multi-talker Speech Separation, ASR And Speaker Diarization (2024)5.24
- End-to-end Multi-channel Speech Separation (2019)0.00
- End-to-end Dereverberation, Beamforming, And Speech Recognition With Improved Numerical Stability And Advanced Frontend (2021)10.97
- WHAMR!: Noisy And Reverberant Single-channel Speech Separation (2019)16.10
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- Time-domain Speech Extraction With Spatial Information And Multi Speaker Conditioning Mechanism (2021)7.81
- TS-SEP: Joint Diarization And Separation Conditioned On Estimated Speaker Embeddings (2023)10.35
- End-to-end Monaural Multi-speaker ASR System Without Pretraining (2018)11.93