Multiple-speaker Localization Based On Direct-path Features And Likelihood Maximization With Spatial Sparsity Regularization
2016 Β· Xiaofei Li, Laurent Girin, Sharon Gannot, et al.
Abstract
This paper addresses the problem of multiple-speaker localization in noisy and reverberant environments, using binaural recordings of an acoustic scene. A Gaussian mixture model (GMM) is adopted, whose components correspond to all the possible candidate source locations defined on a grid. After optimizing the GMM-based objective function, given an observed set of binaural features, both the number of sources and their locations are estimated by selecting the GMM components with the largest priors. This is achieved by enforcing a sparse solution, thus favoring a small number of speakers with respect to the large number of initial candidate source locations. An entropy-based penalty term is added to the likelihood, thus imposing sparsity over the set of GMM priors. In addition, the direct-path relative transfer function (DP-RTF) is used to build robust binaural features. The DP-RTF, recently proposed for single-source localization, was shown to be robust to reverberations, since it encod
Authors
(none)
Tags
Stats
Related papers
- A Cascaded Multiple-speaker Localization And Tracking System (2018)0.00
- The Importance Of Spatial And Spectral Information In Multiple Speaker Tracking (2024)0.00
- Mask-weighted Spatial Likelihood Coding For Speaker-independent Joint Localization And Mask Estimation (2024)0.00
- Jointly Tracking And Separating Speech Sources Using Multiple Features And The Generalized Labeled Multi-bernoulli Framework (2017)0.00
- End-to-end Multi-microphone Speaker Extraction Using Relative Transfer Functions (2025)0.00
- Analyzing The Impact Of Speaker Localization Errors On Speech Separation For Automatic Speech Recognition (2019)0.00
- End-to-end Multi-channel Speaker Extraction And Binaural Speech Synthesis (2024)0.00
- Deep Learning Based Multi-source Localization With Source Splitting And Its Effectiveness In Multi-talker Speech Recognition (2021)14.23