Deep Learning For Speaker Identification: Architectural Insights From AB-1 Corpus Analysis And Performance Evaluation
2024 Β· Matthias Bartolo
Abstract
In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.
Authors
(none)
Tags
Stats
Related papers
- Advanced Accent/dialect Identification And Accentedness Assessment With Multi-embedding Models And Automatic Speech Recognition (2023)7.16
- Robust Acoustic Domain Identification With Its Application To Speaker Diarization (2022)2.26
- Speaker De-identification System Using Autoencoders And Adversarial Training (2020)0.00
- The Exploitation Of Multiple Feature Extraction Techniques For Speaker Identification In Emotional States Under Disguised Voices (2021)2.26
- Large-scale Learning Of Generalised Representations For Speaker Recognition (2022)0.00
- A Text-independent Speaker Verification Model: A Comparative Analysis (2017)8.60
- Deepvox: Discovering Features From Raw Audio For Speaker Recognition In Non-ideal Audio Signals (2020)0.00
- Speaker Fuzzy Fingerprints: Benchmarking Text-based Identification In Multiparty Dialogues (2025)0.00