Awesome Papers

Papers

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV (2026)
Tengfei Liu et al.
13.04
Coding Speech through Vocal Tract Kinematics (2025)
Cheol Jun Cho et al.
11.19
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance (2025)
Qijun Gan et al.
8.70
Study on the Fairness of Speaker Verification Systems on Underrepresented Accents in English (2025)
Mariel Estevez and Luciana Ferrer
8.09
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification (2025)
Yen-Lun Liao et al.
7.81
Acoustics-specific Piano Velocity Estimation (2026)
Federico Simonetta et al.
6.81
Real-time Speech Summarization for Medical Conversations (2025)
Khai Le-Duc et al.
5.24
DEMON: Diffusion Engine for Musical Orchestrated Noise (2026)
Ryan Fosdick
5.17
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis (2025)
Xuehao Zhou et al.
4.52
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation (2026)
Szu-Chi Chen et al.
3.87
Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations (2025)
Bulat Khaertdinov et al.
3.58
Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care (2026)
Vassilis Lyberatos et al.
3.10
EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed (2025)
Ziyang Zhuang et al.
2.26
On Improving Error Resilience of Neural End-to-End Speech Coders (2025)
Kishan Gupta et al.
2.26
A Generative-First Neural Audio Autoencoder (2026)
Jonah Casebeer et al.
1.94
Single-channel speech enhancement by using psychoacoustical model inspired fusion framework (2025)
Suman Samui
0.00
Medical Spoken Named Entity Recognition (2025)
Khai Le-Duc et al.
0.00
Semantic-Aware Interpretable Multimodal Music Auto-Tagging (2025)
Andreas Patakis et al.
0.00
Low-Complexity Neural Wind Noise Reduction for Audio Recordings (2025)
Hesam Eftekhari et al.
0.00
TinyD\'ej\`aVu: Smaller RAM and Faster Inference with Neural Networks on MCUs for Sensor Data Streams (2026)
Zhaolan Huang et al.
0.00
FSD50K-Solo: Automated Curation of Single-Source Sound Events (2026)
Ningyuan Yang et al.
0.00
Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox (2026)
Jiacheng Pang et al.
0.00
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text (2026)
Jiahao Mei et al.
0.00
LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation (2026)
Zhisheng Zhang et al.
0.00
From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection (2026)
Ke Liu et al.
0.00
VoiceGiraffe: A Benchmark for Extreme Long-Context Audio-Language Understanding (2026)
Jashin Ye et al.
0.00
MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation (2026)
Haitian Li et al.
0.00
Unified Synthesis of Compositional Speech and Sound from Free-Form Text Prompts (2026)
Yuyue Wang et al.
0.00
EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction (2026)
Chong Jing et al.
0.00
Audio-Mind: An Auditable Agentic Framework for Audio Understanding (2026)
Yucheng Wang et al.
0.00
Cross-modal characterization of infant cry: validation of a chest-surface accelerometer in extracting acoustic vocal function measures (2026)
Winko W. An et al.
0.00
Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization (2026)
Audrey Chan et al.
0.00
Ragas in Bollywood music A microscopic view through multrifractal cross-correlation method (2021)
Shankha Sanyal et al.
—
Variation of singing styles within a particular Gharana of Hindustani classical music A nonlinear multifractal study (2021)
Archi Banerjee et al.
—
Bird detection in audio: a survey and a challenge (2024)
Dan Stowell et al.
—
An Information-theoretic Approach to Machine-oriented Music Summarization (2021)
Francisco Raposo et al.
—
Music generation with variational recurrent autoencoder supported by history (2021)
Ivan P. Yamshchikov and Alexey Tikhonov
—
The Minor Fall, the Major Lift: Inferring Emotional Valence of Musical Chords through Lyrics (2022)
Artemy Kolchinsky et al.
—
Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data (2022)
Anurag Kumar et al.
—
Localization of Sound Sources in a Room with One Microphone (2026)
Helena Peic Tukuljac et al.
—
Learning audio sequence representations for acoustic event classification (2021)
Zixing Zhang et al.
—
Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2022)
Jen-Cheng Hou et al.
—
A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging (2021)
Keunwoo Choi et al.
—
End-to-End Optimized Speech Coding with Deep Neural Networks (2021)
Srihari Kankanahalli
—
Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio (2025)
Ahmad AbdulKader et al.
—
On the Use of a Spectral Glottal Model for the Source-filter Separation of Speech (2021)
Olivier Perrotin and Ian Vince McLoughlin
—
Neural Style Transfer for Audio Spectograms (2024)
Prateek Verma et al.
—
NELS -- Never-Ending Learner of Sounds (2023)
Benjamin Elizalde et al.
—
Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification (2024)
Mingwen Dong
—
Automatic Minimisation of Masking in Multitrack Audio using Subgroups (2021)
David Ronan et al.
—