Awesome Speech Audio

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🔖Saved

← all topics overview

Speech Enhancement

loading…

Stay Updated

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Speech Enhancement — curated papers, datasets & benchmarks · Awesome Speech Audio

← all topics overview

Awesome Speech Enhancement

Speech Enhancement is one of the most active areas in Awesome Speech Audio — 2,155 papers in this collection, evaluated on datasets like VoiceBank-DEMAND, LibriSpeech, WSJ-0-2Mix. A strong starting point is "Audio-visual Speech Codecs: Rethinking Audio-visual Speech Enhancement By Re-synthesis".

Datasets & benchmarks

VoiceBank-DEMAND34 papers · 🤗

LibriSpeech25 papers · 🤗

WSJ-0-2Mix23 papers

Libri-2Mix19 papers

WHAMR!16 papers

VCTK15 papers · 🤗

SUPERB14 papers

LibriTTS13 papers

CHiME-412 papers

TIMIT11 papers · 🤗

LibriMix11 papers · 🤗

DNS Challenge10 papers · 🤗

Key papers

60 papers · trending (default)numbers = 🔥 heat

Audio-visual Speech Codecs: Rethinking Audio-visual Speech Enhancement By Re-synthesis (2022)
Karren Yang, Dejan Markovic, Steven Krenn, et al.
15.58
Joint Robust Voicing Detection And Pitch Estimation Based On Residual Harmonics (2019)
Thomas Drugman, Abeer Alwan
14.93
Contextual Audio-visual Switching For Speech Enhancement In Real-world Environments (2018)
Ahsan Adeel, Mandar Gogate, Amir Hussain
14.35
The Deterministic Plus Stochastic Model Of The Residual Signal And Its Applications (2019)
Thomas Drugman, Thierry Dutoit
13.17
One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications (2026)
Szu-Wei Fu et al.
10.00
MoonCast: High-Quality Zero-Shot Podcast Generation (2025)
Zeqian Ju et al.
8.52
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers (2025)
Heitor R. Guimar\~aes et al.
7.55
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec (2025)
Haoyang Li et al.
7.29
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline (2025)
Helin Wang et al.
7.13
Deep Cnns Along The Time Axis With Intermap Pooling For Robustness To Spectral Variations (2016)
Hwaran Lee, Geonmin Kim, Ho-Gyeong Kim, et al.
6.77
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations (2025)
Jeong Hun Yeo et al.
6.41
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing (2025)
Zhedong Zhang et al.
6.41
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder (2025)
Samir Sadok et al.
6.06
Wireless Hearables With Programmable Speech AI Accelerators (2025)
Malek Itani et al.
5.90
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement (2025)
Zizhen Lin et al.
5.84
L3AC: Towards a Lightweight and Lossless Audio Codec (2025)
Linwei Zhai et al.
5.29
Constrained Convolutional-recurrent Networks To Improve Speech Quality With Low Impact On Recognition Accuracy (2018)
Rasool Fakoor, Xiaodong He, Ivan Tashev, et al.
5.24
Throat and acoustic paired speech dataset for deep learning-based speech enhancement (2025)
Yunsik Kim et al.
5.18
Speech Denoising with Auditory Models (2020)
Mark R. Saddler et al.
5.06
Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments (2025)
Noussaiba Djeffal et al.
4.93
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement (2025)
Boyi Kang and Xinfa Zhu and Zihan Zhang and Zhen Ye and Mingshuai Liu and Ziqian Wang and Yike Zhu and Guobin Ma and Jun Chen and Longshuai Xiao and Chao Weng and Wei Xue and Lei Xie
4.82
Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment (2025)
Benjamin Stahl and Hannes Gamper
4.76
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement (2025)
Junan Zhang et al.
4.71
Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data (2025)
Sofiane Azzouz et al.
4.69
Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate (2025)
Hanglei Zhang et al.
4.42
AmbiDrop: Ambisonics-Based Array-Agnostic Neural Speech Enhancement (2026)
Michael Tatarjitzky et al.
4.39
Positive-Incentive Noise Predictor for Adversarial Purification in Speaker Verification (2026)
Yibo Bai et al.
4.39
RT-Tango: Real-Time Distributed Binaural Speech Enhancement for Low-Power Hearing Aid Devices (2026)
Z. Benslimane et al.
4.39
SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures (2025)
Kuang Yuan et al.
4.36
Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation (2026)
Xin Zhang et al.
4.33
IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems (2026)
Tao Zhong et al.
4.33
Instantaneous Pitch Estimation via Wave-U-Net-Based Fundamental Waveform Enhancement (2026)
Junya Koguchi et al.
4.33
Reference-Based Recursive Least-Squares Mitigation of Real Interference in Stereo Audio Recordings (2026)
Necati Kagan Erkek et al.
4.33
QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement (2026)
Shogo Yamauchi et al.
4.33
Augmenting Dysarthric Speech Severity Assessment with MOS Supervision (2026)
Kaimeng Jia et al.
4.33
Audio-to-Audio via Diffusion Warm Initialization (2026)
Crist\'obal Andrade et al.
4.33
DASH: Dual-View Self-Distillation with Multi-Layer Hidden Representations for Robust Speech Recognition (2026)
Jaeeun Baik et al.
4.33
Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs? (2026)
Tomoya Mizumoto et al.
4.33
Probing in the Wild: A Case Study of Self-Supervised Speech Representations on Mandarin Sub-dialects with Unsupervised Articulatory Analysis (2026)
Shu Shang et al.
4.33
SE-AGCNet: An End-to-End Framework for Joint Speech Enhancement and Loudness Control in Meeting Scenarios (2026)
Jinming Zhang et al.
4.33
A Large-Scale Database and Predictive Model of Listener-Rated Ease of Speech Understanding in Commercial Hearing Aids (2026)
Andrew Sabin et al.
4.33
DNSMOS-C: Improving End-to-end Speech Quality Models via Contrastive Learning (2026)
Xinyu Liang et al.
4.33
FNSE-SBGAN: Far-field Speech Enhancement with Schrodinger Bridge and Generative Adversarial Networks (2025)
Tong Lei et al.
4.30
Variational Autoencoder for Personalized Pathological Speech Enhancement (2025)
Mingchi Hou and Ina Kodrasi
4.30
Linearly Constrained Deep Beamformer for Multi-Speaker Scenarios (2026)
Ilai Zaidel et al.
4.27
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction (2025)
Chaoyou Fu et al.
4.19
ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement (2025)
Haoxu Wang et al.
4.19
Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior (2025)
Yochai Yemini et al.
3.97
Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance (2025)
Jakob Kienegger et al.
3.86
Speech-dependent Data Augmentation for Own Voice Reconstruction with Hearable Microphones in Noisy Environments (2024)
Mattes Ohlenbusch et al.
3.80
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration (2025)
Shigeki Karita et al.
3.75
FlowTSE: Target Speaker Extraction with Flow Matching (2025)
Aviv Navon et al.
3.75
ReverbFX: A Dataset of Room Impulse Responses Derived from Reverb Effect Plugins for Singing Voice Dereverberation (2025)
Julius Richter et al.
3.75
Universal Speech Enhancement with Regression and Generative Mamba (2025)
Rong Chao et al.
3.75
Interspeech 2025 URGENT Speech Enhancement Challenge (2025)
Kohei Saijo et al.
3.75
Spatial-Filter-Bank-Based Neural Method for Multichannel Speech Enhancement (2025)
Tianqin Zheng et al.
3.70
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication (2025)
Xiao-Hang Jiang et al.
3.70
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model (2025)
Jialong Zuo et al.
3.59
A Hybrid Model for Weakly-Supervised Speech Dereverberation (2025)
Louis Bahrman (S2A et al.
3.59
LMFCA-Net: A Lightweight Model for Multi-Channel Speech Enhancement with Efficient Narrow-Band and Cross-Band Attention (2025)
Yaokai Zhang et al.
3.59