Segment Aggregation For Short Utterances Speaker Verification Using Raw Waveforms
2020 Β· Seung-Bin Kim, Jee-Weon Jung, Hye-Jin Shim, et al.
Abstract
Most studies on speaker verification systems focus on long-duration utterances, which are composed of sufficient phonetic information. However, the performances of these systems are known to degrade when short-duration utterances are inputted due to the lack of phonetic information as compared to the long utterances. In this paper, we propose a method that compensates for the performance degradation of speaker verification for short utterances, referred to as "segment aggregation". The proposed method adopts an ensemble-based design to improve the stability and accuracy of speaker verification systems. The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding. Then, this method simultaneously trains the segment embeddings and the aggregated speaker embedding. In addition, we also modified the teacher-student learning method for the proposed method. Experimen
Authors
(none)
Tags
Stats
Related papers
- Short Utterance Compensation In Speaker Verification Via Cosine-based Teacher-student Learning Of Speaker Embeddings (2018)10.74
- Deep Segment Attentive Embedding For Duration Robust Speaker Verification (2018)2.26
- Rawnext: Speaker Verification System For Variable-duration Utterances With Deep Layer Aggregation And Extended Dynamic Scaling Policies (2021)12.24
- A Unified Deep Learning Framework For Short-duration Speaker Verification In Adverse Environments (2020)9.41
- Improving Multi-scale Aggregation Using Feature Pyramid Module For Robust Speaker Verification Of Variable-duration Utterances (2020)10.48
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- Quality Measures For Speaker Verification With Short Utterances (2019)0.00
- Mr-rawnet: Speaker Verification System With Multiple Temporal Resolutions For Variable Duration Utterances Using Raw Waveforms (2024)2.26