cs.SD
50 papers tagged cs.SD (ordered by heat_score)
Papers
- LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV (2026)Tengfei Liu et al.13.04
- Coding Speech through Vocal Tract Kinematics (2025)Cheol Jun Cho et al.11.19
- PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in
Piano Performance (2025)Qijun Gan et al.8.70
- Study on the Fairness of Speaker Verification Systems on Underrepresented Accents in English (2025)Mariel Estevez and Luciana Ferrer8.09
- Adversarial Speaker Distillation for Countermeasure Model on Automatic
Speaker Verification (2025)Yen-Lun Liao et al.7.81
- Acoustics-specific Piano Velocity Estimation (2026)Federico Simonetta et al.6.81
- Real-time Speech Summarization for Medical Conversations (2025)Khai Le-Duc et al.5.24
- DEMON: Diffusion Engine for Musical Orchestrated Noise (2026)Ryan Fosdick5.17
- Multi-Scale Accent Modeling and Disentangling for Multi-Speaker
Multi-Accent Text-to-Speech Synthesis (2025)Xuehao Zhou et al.4.52
- MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation (2026)Szu-Chi Chen et al.3.87
- Exploring Self-Supervised Multi-view Contrastive Learning for Speech
Emotion Recognition with Limited Annotations (2025)Bulat Khaertdinov et al.3.58
- Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care (2026)Vassilis Lyberatos et al.3.10
- EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech
Recognition Architecture with High Accuracy and Inference Speed (2025)Ziyang Zhuang et al.2.26
- On Improving Error Resilience of Neural End-to-End Speech Coders (2025)Kishan Gupta et al.2.26
- A Generative-First Neural Audio Autoencoder (2026)Jonah Casebeer et al.1.94
- Single-channel speech enhancement by using psychoacoustical model inspired fusion framework (2025)Suman Samui0.00
- Medical Spoken Named Entity Recognition (2025)Khai Le-Duc et al.0.00
- Semantic-Aware Interpretable Multimodal Music Auto-Tagging (2025)Andreas Patakis et al.0.00
- Low-Complexity Neural Wind Noise Reduction for Audio Recordings (2025)Hesam Eftekhari et al.0.00
- TinyD\'ej\`aVu: Smaller RAM and Faster Inference with Neural Networks on MCUs for Sensor Data Streams (2026)Zhaolan Huang et al.0.00
- FSD50K-Solo: Automated Curation of Single-Source Sound Events (2026)Ningyuan Yang et al.0.00
- Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox (2026)Jiacheng Pang et al.0.00
- Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text (2026)Jiahao Mei et al.0.00
- LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation (2026)Zhisheng Zhang et al.0.00
- From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection (2026)Ke Liu et al.0.00
- VoiceGiraffe: A Benchmark for Extreme Long-Context Audio-Language Understanding (2026)Jashin Ye et al.0.00
- MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation (2026)Haitian Li et al.0.00
- Unified Synthesis of Compositional Speech and Sound from Free-Form Text Prompts (2026)Yuyue Wang et al.0.00
- EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction (2026)Chong Jing et al.0.00
- Audio-Mind: An Auditable Agentic Framework for Audio Understanding (2026)Yucheng Wang et al.0.00
- Cross-modal characterization of infant cry: validation of a chest-surface accelerometer in extracting acoustic vocal function measures (2026)Winko W. An et al.0.00
- Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization (2026)Audrey Chan et al.0.00
- Ragas in Bollywood music A microscopic view through multrifractal
cross-correlation method (2021)Shankha Sanyal et al.β
- Variation of singing styles within a particular Gharana of Hindustani
classical music A nonlinear multifractal study (2021)Archi Banerjee et al.β
- Bird detection in audio: a survey and a challenge (2024)Dan Stowell et al.β
- An Information-theoretic Approach to Machine-oriented Music
Summarization (2021)Francisco Raposo et al.β
- Music generation with variational recurrent autoencoder supported by
history (2021)Ivan P. Yamshchikov and Alexey Tikhonovβ
- The Minor Fall, the Major Lift: Inferring Emotional Valence of Musical
Chords through Lyrics (2022)Artemy Kolchinsky et al.β
- Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web
Data (2022)Anurag Kumar et al.β
- Localization of Sound Sources in a Room with One Microphone (2026)Helena Peic Tukuljac et al.β
- Learning audio sequence representations for acoustic event
classification (2021)Zixing Zhang et al.β
- Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional
Neural Networks (2022)Jen-Cheng Hou et al.β
- A Comparison of Audio Signal Preprocessing Methods for Deep Neural
Networks on Music Tagging (2021)Keunwoo Choi et al.β
- End-to-End Optimized Speech Coding with Deep Neural Networks (2021)Srihari Kankanahalliβ
- Multiple-Instance, Cascaded Classification for Keyword Spotting in
Narrow-Band Audio (2025)Ahmad AbdulKader et al.β
- On the Use of a Spectral Glottal Model for the Source-filter Separation
of Speech (2021)Olivier Perrotin and Ian Vince McLoughlinβ
- Neural Style Transfer for Audio Spectograms (2024)Prateek Verma et al.β
- NELS -- Never-Ending Learner of Sounds (2023)Benjamin Elizalde et al.β
- Convolutional Neural Network Achieves Human-level Accuracy in Music
Genre Classification (2024)Mingwen Dongβ
- Automatic Minimisation of Masking in Multitrack Audio using Subgroups (2021)David Ronan et al.β