Property-aware Multi-speaker Data Simulation: A Probabilistic Modelling Technique For Synthetic Data Generation
2023 Β· Tae Jin Park, He Huang, Coleman Hooper, et al.
Abstract
We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for developing neural models suited for speaker diarization and voice activity detection. The acquisition of substantial datasets for speaker diarization often presents a significant challenge, particularly in multi-speaker scenarios. Furthermore, the precise time stamp annotation of speech data is a critical factor for training both speaker diarization and voice activity detection. Our proposed multi-speaker simulator tackles these problems by generating large-scale audio mixtures that maintain statistical properties closely aligned with the input parameters. We demonstrate that the proposed multi-speaker simulator generates audio mixtures with statis
Authors
(none)
Tags
Stats
Related papers
- Multi-speaker And Wide-band Simulated Conversations As Training Data For End-to-end Neural Diarization (2022)8.60
- Realistic Multi-microphone Data Simulation For Distant Speech Recognition (2017)9.76
- Effective Noise-aware Data Simulation For Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation (2024)3.58
- Speaker Verification-derived Loss And Data Augmentation For Dnn-based Multispeaker Speech Synthesis (2021)3.58
- Msdtron: A High-capability Multi-speaker Speech Synthesis System For Diverse Data Using Characteristic Information (2021)4.52
- Fake It To Make It: Using Synthetic Data To Remedy The Data Shortage In Joint Multimodal Speech-and-gesture Synthesis (2024)6.34
- Generating Data With Text-to-speech And Large-language Models For Conversational Speech Recognition (2024)6.34
- Training Multi-speaker Neural Text-to-speech Systems Using Speaker-imbalanced Speech Corpora (2019)8.09