Awesome Papers

Papers

Fast query-by-example speech search using separable model (2021)
Yuguang Yang et al.
—
Rapid solution for searching similar audio items (2022)
Kastriot Kadriu
—
Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval (2022)
Daiki Takeuchi et al.
—
Fast and parallel decoding for transducer (2022)
Wei Kang et al.
—
Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example (2023)
Anup Singh et al.
—
Robust and lightweight audio fingerprint for Automatic Content Recognition (2023)
Anoubhav Agarwaal et al.
—
Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search (2024)
Matthew C. McCallum et al.
—
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention (2026)
Yixin Liu et al.
—
Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data (2026)
Ragib Amin Nihal et al.
—
ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis (2026)
Mohammad Javad Ranjbar Kalahroodi et al.
—
Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval (2026)
Ilyass Moummad et al.
—
JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments (2026)
Zhan Liu et al.
—
AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching (2026)
Pengfei Zhang et al.
—
PHALAR: Phasors for Learned Musical Audio Representations (2026)
Davide Marincione et al.
—
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization (2026)
Zheng Fang et al.
—
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents (2026)
Tara Bogavelli et al.
—
CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation (2026)
Gyubin Lee et al.
—
Music Transcription with (Almost) No Supervision (2026)
Saebyeol Shin et al.
—
AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models (2026)
Jialiang Yang et al.
—
A Multimodal Framework for Dementia Detection via Linguistic and Acoustic Representation Learning (2026)
Loukas Ilias et al.
—
Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio (2026)
Georgios Milis et al.
—
DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation (2026)
Ferdinand Paar et al.
—
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV (2026)
Tengfei Liu et al.
—
Beyond Binary: Speech Representations Across the Cognitive Score Hierarchy (2026)
Serli Kopar et al.
—
Learning When to Think While Listening in Large Audio-Language Models (2026)
Zhiyuan Song et al.
—
Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion (2026)
S. Sutharya et al.
—
COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings (2026)
Yonggang Zhu et al.
—
Benchmarking Single-Factor Physical Video-to-Audio Generation (2026)
Tingle Li et al.
—