cs.SD
28 papers tagged cs.SD (ordered by heat_score)
Papers
- Fast query-by-example speech search using separable model (2021)Yuguang Yang et al.β
- Rapid solution for searching similar audio items (2022)Kastriot Kadriuβ
- Introducing Auxiliary Text Query-modifier to Content-based Audio
Retrieval (2022)Daiki Takeuchi et al.β
- Fast and parallel decoding for transducer (2022)Wei Kang et al.β
- Simultaneously Learning Robust Audio Embeddings and balanced Hash codes
for Query-by-Example (2023)Anup Singh et al.β
- Robust and lightweight audio fingerprint for Automatic Content
Recognition (2023)Anoubhav Agarwaal et al.β
- Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for
Tempo Prediction and Search (2024)Matthew C. McCallum et al.β
- XAttnMark: Learning Robust Audio Watermarking with Cross-Attention (2026)Yixin Liu et al.β
- Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data (2026)Ragib Amin Nihal et al.β
- ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis (2026)Mohammad Javad Ranjbar Kalahroodi et al.β
- Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval (2026)Ilyass Moummad et al.β
- JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments (2026)Zhan Liu et al.β
- AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching (2026)Pengfei Zhang et al.β
- PHALAR: Phasors for Learned Musical Audio Representations (2026)Davide Marincione et al.β
- Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization (2026)Zheng Fang et al.β
- EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents (2026)Tara Bogavelli et al.β
- CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation (2026)Gyubin Lee et al.β
- Music Transcription with (Almost) No Supervision (2026)Saebyeol Shin et al.β
- AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models (2026)Jialiang Yang et al.β
- A Multimodal Framework for Dementia Detection via Linguistic and Acoustic Representation Learning (2026)Loukas Ilias et al.β
- Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio (2026)Georgios Milis et al.β
- DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation (2026)Ferdinand Paar et al.β
- LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV (2026)Tengfei Liu et al.β
- Beyond Binary: Speech Representations Across the Cognitive Score Hierarchy (2026)Serli Kopar et al.β
- Learning When to Think While Listening in Large Audio-Language Models (2026)Zhiyuan Song et al.β
- Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion (2026)S. Sutharya et al.β
- COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings (2026)Yonggang Zhu et al.β
- Benchmarking Single-Factor Physical Video-to-Audio Generation (2026)Tingle Li et al.β