Awesome Papers

Papers

Coding Speech through Vocal Tract Kinematics (2025)
Cheol Jun Cho et al.
11.19
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance (2025)
Qijun Gan et al.
8.70
Study on the Fairness of Speaker Verification Systems on Underrepresented Accents in English (2025)
Mariel Estevez and Luciana Ferrer
8.09
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification (2025)
Yen-Lun Liao et al.
7.81
MEMS and ECM Sensor Technologies for Cardiorespiratory Sound Monitoring - A Comprehensive Review (2025)
Yasaman Torabi et al.
7.16
Acoustics-specific Piano Velocity Estimation (2026)
Federico Simonetta et al.
6.81
Real-time Speech Summarization for Medical Conversations (2025)
Khai Le-Duc et al.
5.24
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis (2025)
Xuehao Zhou et al.
4.52
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation (2026)
Szu-Chi Chen et al.
3.87
Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations (2025)
Bulat Khaertdinov et al.
3.58
EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed (2025)
Ziyang Zhuang et al.
2.26
On Improving Error Resilience of Neural End-to-End Speech Coders (2025)
Kishan Gupta et al.
2.26
A Generative-First Neural Audio Autoencoder (2026)
Jonah Casebeer et al.
1.94
Single-channel speech enhancement by using psychoacoustical model inspired fusion framework (2025)
Suman Samui
0.00
Medical Spoken Named Entity Recognition (2025)
Khai Le-Duc et al.
0.00
Semantic-Aware Interpretable Multimodal Music Auto-Tagging (2025)
Andreas Patakis et al.
0.00
Low-Complexity Neural Wind Noise Reduction for Audio Recordings (2025)
Hesam Eftekhari et al.
0.00
Joint decoding method for controllable contextual speech recognition based on Speech LLM (2025)
Yangui Fang et al.
0.00
TinyD\'ej\`aVu: Smaller RAM and Faster Inference with Neural Networks on MCUs for Sensor Data Streams (2026)
Zhaolan Huang et al.
0.00
StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation (2026)
Nikita Kuzmin et al.
0.00
Sky-Ear: An Unmanned Aerial Vehicle-Enabled Victim Sound Detection and Localization System (2026)
Yi Hong et al.
0.00
Enhancing ASR Performance in the Medical Domain for Dravidian Languages (2026)
Sri Charan Devarakonda et al.
0.00
MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis (2026)
Yuanhao Chen et al.
0.00
FSD50K-Solo: Automated Curation of Single-Source Sound Events (2026)
Ningyuan Yang et al.
0.00
LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation (2026)
Zhisheng Zhang et al.
0.00
I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors (2026)
Lelia Erscoi (Computational Speech Group et al.
0.00
Diffusion Large Language Models for Visual Speech Recognition (2026)
Jeong Hun Yeo et al.
0.00
Audio-Mind: An Auditable Agentic Framework for Audio Understanding (2026)
Yucheng Wang et al.
0.00
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios (2026)
Changhao Pan et al.
0.00
The Microsoft 2016 Conversational Speech Recognition System (2022)
W. Xiong et al.
—
Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2022)
Jen-Cheng Hou et al.
—
End-to-End Optimized Speech Coding with Deep Neural Networks (2021)
Srihari Kankanahalli
—
Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio (2025)
Ahmad AbdulKader et al.
—
On the Use of a Spectral Glottal Model for the Source-filter Separation of Speech (2021)
Olivier Perrotin and Ian Vince McLoughlin
—
Neural Style Transfer for Audio Spectograms (2024)
Prateek Verma et al.
—
NELS -- Never-Ending Learner of Sounds (2023)
Benjamin Elizalde et al.
—
Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification (2024)
Mingwen Dong
—
Automatic Minimisation of Masking in Multitrack Audio using Subgroups (2021)
David Ronan et al.
—
A toolbox for rendering virtual acoustic environments in the context of audiology (2025)
Giso Grimm et al.
—
Extended pipeline for content-based feature engineering in music genre recognition (2021)
Tina Raissi (1) et al.
—
Relative Transfer Function Estimation Exploiting Spatially Separated Microphones in a Diffuse Noise Field (2022)
N. G\"o{\ss}ling and S. Doclo
—
Sparse Pursuit and Dictionary Learning for Blind Source Separation in Polyphonic Music Recordings (2021)
S\"oren Schulze and Emily J. King
—
DNN-HMM based Speaker Adaptive Emotion Recognition using Proposed Epoch and MFCC Features (2021)
Md. Shah Fahad et al.
—
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR (2021)
Yerbolat Khassanov and Eng Siong Chng
—
AudioMNIST: Exploring Explainable Artificial Intelligence for Audio Analysis on a Simple Benchmark (2023)
S\"oren Becker et al.
—
RTF-Based Binaural MVDR Beamformer Exploiting an External Microphone in a Diffuse Noise Field (2022)
N. G\"o{\ss}ling et al.
—
Optimal Binaural LCMV Beamforming in Complex Acoustic Scenarios: Theoretical and Practical Insights (2022)
N. G\"o{\ss}ling et al.
—
Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge (2024)
Dan Stowell et al.
—
Auto-adaptive Resonance Equalization using Dilated Residual Networks (2024)
Maarten Grachten et al.
—
Acoustic Scene Classification: A Competition Review (2024)
Shayan Gharib et al.
—