Audio & Speech Processing
23 papers tagged Audio & Speech Processing — re-sort below
Papers
- FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition (2026)Fernando L\'opez et al.5.01
- Prior over Evidence: Stereotype-Driven Diagnosis in LLM-Based L2 Pronunciation Feedback (2026)Rong Wang et al.4.39
- ROMPAR: Morphological Completion and Demographic Unlearning for Romanian-Accented Speech Recognition (2026)Andrei-Marius Avram et al.4.39
- Rhythm of the Deep: A Computational-Linguistic Test of Duality of Patterning in Sperm Whale Codas (2026)Mudit Sinha et al.4.39
- XAI-Grounded Explanation Generation for Speech Deepfake Detection with Training-Free Multimodal Large Language Models (2026)Yupei Li et al.4.39
- TMASC: Transmasculine Attitude and Speech Corpus (2026)Sidney Wong4.39
- From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal Text (2026)Sadia Noor et al.4.39
- Data-Driven Decoding of Russell's Circumplex Model of Affect (2026)Amdjed Belaref et al.4.39
- BareWave: Waveform-Native Flow-Matching Text-to-Speech (2026)Wei Fan et al.3.51
- Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers (2026)Yacouba Kaloga et al.2.00
- Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation (2026)Ziyu Zhang et al.2.00
- Few-shot Class-variable Incremental Audio Classification via Prototype Adaptation and Pseudo Class-variable Training (2026)Yanxiong Li et al.2.00
- Fast Speech Foundation Model Distillation Using Interleaved Stacking (2026)Eungbeom Kim et al.2.00
- LLM-Based Synthetic Ground Truth Generation for Audio-Based Emotion Classification via In-Context Learning (2026)Qing Huang et al.2.00
- Spectro-Temporal Interference Confounds Phase Encoding in Spatial Audio Foundation Models (2026)Yuxuan Chen et al.2.00
- An Empirical Study on Learning Latent Representations for Emotional Speech Synthesis (2026)Vinh Dang Quang et al.2.00
- AP-GRPO: Anchor-Gated Phonetic Alignment with Policy Optimization for Pathological Speech Reconstruction (2026)Pengfei Zhang et al.2.00
- Semi-Supervised Speech Confidence Detection using Pseudo-Labelling and Whisper Embeddings (2026)Adam Wynn et al.2.00
- Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection (2026)Zhuodong Liu et al.2.00
- ArtNet: A JEPA-Like Articulatory Predictive Framework for Robust Zero-Shot Phoneme Recognition (2026)Zeqian Hu et al.2.00
- Nemotron 3 Nano Omni: Efficient And Open Multimodal Intelligence (2026)Nvidia, :, Amala Sanjay Deshmukh, et al.2.00
- Deepfake Detection System for Audio and Video Calls (2026)Dr. Kavitha Devi C S2.00
- TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation (2025)Keunwoo Choi et al.1.50