Awesome Speech Audio

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🔖Saved

← all topics overview

Speaker Analysis

loading…

Stay Updated

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Speaker Analysis — curated papers, datasets & benchmarks · Awesome Speech Audio

← all topics overview

Awesome Speaker Analysis

Speaker Analysis is one of the most active areas in Awesome Speech Audio — 1,589 papers in this collection, evaluated on datasets like VoxCeleb-1, VoxCeleb, LibriSpeech. A strong starting point is "Detection Of Glottal Closure Instants From Speech Signals: A Quantitative Review".

Datasets & benchmarks

VoxCeleb-131 papers

VoxCeleb24 papers · 🤗

LibriSpeech23 papers · 🤗

CallHome18 papers · 🤗

IEMOCAP18 papers

LibriMix17 papers · 🤗

AMI16 papers · 🤗

SUPERB14 papers

WSJ-0-2Mix12 papers

Libri-2Mix12 papers

VoxCeleb210 papers · 🤗

ASVspoof 2019 LA10 papers · 🤗

Key papers

60 papers · trending (default)numbers = 🔥 heat

Detection Of Glottal Closure Instants From Speech Signals: A Quantitative Review (2019)
Thomas Drugman, Mark Thomas, Jon Gudnason, et al.
16.88
Developing Far-field Speaker System Via Teacher-student Learning (2018)
Jinyu Li, Rui Zhao, Zhuo Chen, et al.
10.85
Phonetic-and-semantic Embedding Of Spoken Words With Applications In Spoken Content Retrieval (2018)
Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, et al.
9.76
Designing An Effective Metric Learning Pipeline For Speaker Diarization (2018)
Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan, Huan Song, et al.
8.60
Fast Variational Bayes For Heavy-tailed PLDA Applied To I-vectors And X-vectors (2018)
Anna Silnova, Niko Brummer, Daniel Garcia-Romero, et al.
8.35
The MSP-Podcast Corpus (2025)
Carlos Busso et al.
8.23
Dicow: Diarization-conditioned Whisper For Target Speaker Automatic Speech Recognition (2024)
Alexander Polok, Dominik Klement, Martin Kocour, et al.
8.09
S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models (2025)
Feng Jiang et al.
7.77
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers (2025)
Heitor R. Guimar\~aes et al.
7.55
Generalizing Speaker Verification For Spoof Awareness In The Embedding Space (2024)
Xuechen Liu, Md Sahidullah, Kong Aik Lee, et al.
7.16
Probabilistic Spherical Discriminant Analysis: An Alternative To PLDA For Length-normalized Embeddings (2022)
Niko Brümmer, Albert Swart, Ladislav Mošner, et al.
6.77
A Comparison Of Metric Learning Loss Functions For End-to-end Speaker Verification (2020)
Juan M. Coria, Hervé Bredin, Sahar Ghannay, et al.
6.77
Incorporating Pass-phrase Dependent Background Models For Text-dependent Speaker Verification (2016)
A. K. Sarkar, Zheng-Hua Tan
6.77
A Discriminative Condition-aware Backend For Speaker Verification (2019)
Luciana Ferrer, Mitchell McLaren
6.34
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark (2025)
Xi Xuan et al.
6.18
Multitask Learning with Capsule Networks for Speech-to-Intent Applications (2020)
Jakob Poncelet et al.
6.08
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder (2025)
Samir Sadok et al.
6.06
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models (2025)
Beilong Tang et al.
5.96
Mntts2: An Open-source Multi-speaker Mongolian Text-to-speech Synthesis Dataset (2022)
Kailin Liang, Bin Liu, Yifan Hu, et al.
5.81
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations (2025)
Xue Jiang et al.
5.59
Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation (2025)
Wen Huang et al.
5.48
SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech (2025)
Yuqi Li et al.
4.93
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility (2025)
Yifan Liu et al.
4.82
Discrete Speech Unit Extraction via Independent Component Analysis (2025)
Tomohiko Nakamura et al.
4.71
Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis (2025)
Minu Kim et al.
4.71
VBx for End-to-End Neural and Clustering-based Diarization (2025)
Petr P\'alka et al.
4.69
A long-form single-speaker real-time MRI speech dataset and benchmark (2025)
Sean Foley et al.
4.64
Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering (2025)
Tobias Cord-Landwehr et al.
4.47
Positive-Incentive Noise Predictor for Adversarial Purification in Speaker Verification (2026)
Yibo Bai et al.
4.39
Disentangling Speaker and Language Effects in Cross-Lingual Speaker Verification for Iberian Languages (2026)
Pol Buitrago et al.
4.39
H-SAGE: Holistic Speaker-Aware Guided Experts for MoE-based Multi-Talker ASR (2026)
Yujie Guo et al.
4.39
Geometric Second-Order Feature Correlation Learning for Self-Supervised Speech Emotion Recognition (2026)
Shuanglin Li et al.
4.33
Efficiency-Performance Trade-offs in Neural Speaker Diarization via Structured Pruning and Low-Bit Quantization (2026)
Rishit Chatterjee et al.
4.33
MaskedFOP: Polyglot Speaker Identification under Missing Visual Modality via Cascaded Graph Label Propagation (2026)
Ayoub Elkhouzari et al.
4.33
Continuous-Speech Parkinson's Disease Detection Using Acoustic and Inharmonicity Features (2026)
Rujia Li et al.
4.33
Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors (2026)
Michael Finkelson et al.
4.33
EmotionAI: A Privacy-Preserving Computational Intelligence Pipeline for Speech-Emotion-Grounded Conversational Analysis (2026)
Wai Laam Mak et al.
4.33
Real-Time Voice AI Hears but Does Not Listen (2026)
Martijn Bartelds et al.
4.33
Generative AI and Copyright Infringement: A Legal-Technical Analysis of AI Music Generation Systems Under 17 U.S.C. Title 17 (2026)
Zuhaib Hussain Butt
4.33
Neural Speaker Diarization via Multilingual Training: Evaluation on Low-Resource Nepali-Hindi Speech (2026)
Samip Neupane et al.
4.33
Advancing Speaker-Based Vocal Effort Classification with WavLM and Data Augmentation in Naturalistic Non-Calibrated Speech Recordings (2026)
Zahra Omidi et al.
4.33
Variational Autoencoder for Personalized Pathological Speech Enhancement (2025)
Mingchi Hou and Ina Kodrasi
4.30
Linearly Constrained Deep Beamformer for Multi-Speaker Scenarios (2026)
Ilai Zaidel et al.
4.27
Hardware-Aware Federated Learning for Speech Emotion Recognition (2026)
Beyazit Bestami Yuksel et al.
4.27
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders (2025)
Weiqiao Shan et al.
4.25
Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation (2025)
Zhengyan Sheng and Zhihao Du and Heng Lu and Shiliang Zhang and Zhen-Hua Ling
4.19
Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis (2025)
Marc-Andr\'e Carbonneau et al.
3.86
Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance (2025)
Jakob Kienegger et al.
3.86
RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing (2025)
Yang Xiao et al.
3.86
LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech (2025)
Niyati Bafna and Matthew Wiesner
3.81
Bringing Interpretability to Neural Audio Codecs (2025)
Samir Sadok and Julien Hauret and \'Eric Bavu
3.81
Investigating self-supervised features for expressive, multilingual voice conversion (2025)
\'Alvaro Mart\'in-Cortinas et al.
3.75
FlowTSE: Target Speaker Extraction with Flow Matching (2025)
Aviv Navon et al.
3.75
Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback (2025)
Peyman Jahanbin
3.70
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting (2025)
Mohammad Jahid Ibna Basher et al.
3.59
CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation (2025)
Ziqi Liang et al.
3.53
EmoFormer: A Text-Independent Speech Emotion Recognition using a Hybrid Transformer-CNN model (2025)
Rashedul Hasan et al.
3.53
SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings (2026)
Priyam Mazumdar et al.
3.51
MoDiCoL: A Modular Diagnostic Continual Learning Dataset for Robust Speech Recognition (2026)
Theresa Pekarek Rosin et al.
3.45
What Does a Pathological Speech Assessment Model Know about Acoustic Features? A Case Study on Oral and Oropharyngeal Cancer Patients (2026)
Tuan Nguyen (LIA et al.
3.45