Awesome Audio & Speech Processing

Papers

FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition (2026)
Fernando L\'opez et al.
5.01
Prior over Evidence: Stereotype-Driven Diagnosis in LLM-Based L2 Pronunciation Feedback (2026)
Rong Wang et al.
4.39
ROMPAR: Morphological Completion and Demographic Unlearning for Romanian-Accented Speech Recognition (2026)
Andrei-Marius Avram et al.
4.39
Rhythm of the Deep: A Computational-Linguistic Test of Duality of Patterning in Sperm Whale Codas (2026)
Mudit Sinha et al.
4.39
XAI-Grounded Explanation Generation for Speech Deepfake Detection with Training-Free Multimodal Large Language Models (2026)
Yupei Li et al.
4.39
TMASC: Transmasculine Attitude and Speech Corpus (2026)
Sidney Wong
4.39
From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal Text (2026)
Sadia Noor et al.
4.39
Data-Driven Decoding of Russell's Circumplex Model of Affect (2026)
Amdjed Belaref et al.
4.39
BareWave: Waveform-Native Flow-Matching Text-to-Speech (2026)
Wei Fan et al.
3.51
Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers (2026)
Yacouba Kaloga et al.
2.00
Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation (2026)
Ziyu Zhang et al.
2.00
Few-shot Class-variable Incremental Audio Classification via Prototype Adaptation and Pseudo Class-variable Training (2026)
Yanxiong Li et al.
2.00
Fast Speech Foundation Model Distillation Using Interleaved Stacking (2026)
Eungbeom Kim et al.
2.00
LLM-Based Synthetic Ground Truth Generation for Audio-Based Emotion Classification via In-Context Learning (2026)
Qing Huang et al.
2.00
Spectro-Temporal Interference Confounds Phase Encoding in Spatial Audio Foundation Models (2026)
Yuxuan Chen et al.
2.00
An Empirical Study on Learning Latent Representations for Emotional Speech Synthesis (2026)
Vinh Dang Quang et al.
2.00
AP-GRPO: Anchor-Gated Phonetic Alignment with Policy Optimization for Pathological Speech Reconstruction (2026)
Pengfei Zhang et al.
2.00
Semi-Supervised Speech Confidence Detection using Pseudo-Labelling and Whisper Embeddings (2026)
Adam Wynn et al.
2.00
Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection (2026)
Zhuodong Liu et al.
2.00
ArtNet: A JEPA-Like Articulatory Predictive Framework for Robust Zero-Shot Phoneme Recognition (2026)
Zeqian Hu et al.
2.00
Nemotron 3 Nano Omni: Efficient And Open Multimodal Intelligence (2026)
Nvidia, :, Amala Sanjay Deshmukh, et al.
2.00
Deepfake Detection System for Audio and Video Calls (2026)
Dr. Kavitha Devi C S
2.00
TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation (2025)
Keunwoo Choi et al.
1.50