Frame-based Overlapping Speech Detection Using Convolutional Neural Networks
2020 Β· Midia Yousefi, John H. L. Hansen
Abstract
Naturalistic speech recordings usually contain speech signals from multiple speakers. This phenomenon can degrade the performance of speech technologies due to the complexity of tracing and recognizing individual speakers. In this study, we investigate the detection of overlapping speech on segments as short as 25 ms using Convolutional Neural Networks. We evaluate the detection performance using different spectral features, and show that pyknogram features outperforms other commonly used speech features. The proposed system can predict overlapping speech with an accuracy of 84% and Fscore of 88% on a dataset of mixed speech generated based on the GRID dataset.
Authors
(none)
Tags
Stats
Related papers
- Three-class Overlapped Speech Detection Using A Convolutional Recurrent Neural Network (2021)7.81
- Overlap-aware Diarization: Resegmentation Using Neural End-to-end Overlapped Speech Detection (2019)13.17
- Overlapped Speech Recognition From A Jointly Learned Multi-channel Neural Speech Extraction And Representation (2019)0.00
- Distortionless Multi-channel Target Speech Enhancement For Overlapped Speech Recognition (2020)0.00
- Real-time Speaker Counting In A Cocktail Party Scenario Using Attention-guided Convolutional Neural Network (2021)6.77
- Prosodic Event Recognition Using Convolutional Neural Networks With Context Information (2017)5.84
- Learning To Enhance Or Not: Neural Network-based Switching Of Enhanced And Observed Signals For Overlapping Speech Recognition (2022)10.21
- Joint Speech And Overlap Detection: A Benchmark Over Multiple Audio Setup And Speech Domains (2023)0.00