Learnable Frequency Filters For Speech Feature Extraction In Speaker Verification
2022 Β· Jingyu Li, Yusheng Tian, Tan Lee
Abstract
Mel-scale spectrum features are used in various recognition and classification tasks on speech signals. There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV). This paper describes a learnable front-end feature extraction model. The model comprises a group of filters to transform the Fourier spectrum. Model parameters that define these filters are trained end-to-end and optimized specifically for the task of speaker verification. Compared to the standard Mel-scale filter-bank, the filters' bandwidths and center frequencies are adjustable. Experimental results show that applying the learnable acoustic front-end improves speaker verification performance over conventional Mel-scale spectrum features. Analysis on the learned filter parameters suggests that narrow-band information benefits the SV system performance. The proposed model achieves a good balance between performance and computation cost. In resource-constrained c
Authors
(none)
Tags
Stats
Related papers
- Optimization Of Data-driven Filterbank For Automatic Speaker Verification (2020)11.93
- Y-vector: Multiscale Waveform Encoder For Speaker Embedding (2020)8.60
- Learning Multiscale Features Directly From Waveforms (2016)0.00
- Short-segment Speaker Verification With Pre-trained Models And Multi-resolution Encoder (2025)0.00
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52
- A Comparative Re-assessment Of Feature Extractors For Deep Speaker Embeddings (2020)8.09
- Multi-stream Convolutional Neural Network With Frequency Selection For Robust Speaker Verification (2020)3.58
- Deepvox: Discovering Features From Raw Audio For Speaker Recognition In Non-ideal Audio Signals (2020)0.00