Jointly Tracking And Separating Speech Sources Using Multiple Features And The Generalized Labeled Multi-bernoulli Framework
2017 Β· Shoufeng Lin
Abstract
This paper proposes a novel joint multi-speaker tracking-and-separation method based on the generalized labeled multi-Bernoulli (GLMB) multi-target tracking filter, using sound mixtures recorded by microphones. Standard multi-speaker tracking algorithms usually only track speaker locations, and ambiguity occurs when speakers are spatially close. The proposed multi-feature GLMB tracking filter treats the set of vectors of associated speaker features (location, pitch and sound) as the multi-target multi-feature observation, characterizes transitioning features with corresponding transition models and overall likelihood function, thus jointly tracks and separates each multi-feature speaker, and addresses the spatial ambiguity problem. Numerical evaluation verifies that the proposed method can correctly track locations of multiple speakers and meanwhile separate speech signals.
Authors
(none)
Tags
Stats
Related papers
- Audio-visual Speech Separation Based On Joint Feature Representation With Cross-modal Attention (2022)0.00
- A Cascaded Multiple-speaker Localization And Tracking System (2018)0.00
- Multiple-speaker Localization Based On Direct-path Features And Likelihood Maximization With Spatial Sparsity Regularization (2016)11.85
- The Importance Of Spatial And Spectral Information In Multiple Speaker Tracking (2024)0.00
- A Purely End-to-end System For Multi-speaker Speech Recognition (2018)12.25
- Simultaneous Diarization And Separation Of Meetings Through The Integration Of Statistical Mixture Models (2024)0.00
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Joint Speaker Features Learning For Audio-visual Multichannel Speech Separation And Recognition (2024)0.00