cs.MM
50 papers tagged cs.MM (ordered by heat_score)
Papers
- Rethinking Memory as Continuously Evolving Connectivity (2026)Jizhan Fang et al.13.31
- LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV (2026)Tengfei Liu et al.13.04
- PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in
Piano Performance (2025)Qijun Gan et al.8.70
- Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation (2026)Shuhong Zheng et al.7.39
- TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech (2026)Girish A. Koushik et al.1.96
- From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection (2026)Ke Liu et al.0.00
- MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation (2026)Haitian Li et al.0.00
- Unified Synthesis of Compositional Speech and Sound from Free-Form Text Prompts (2026)Yuyue Wang et al.0.00
- EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction (2026)Chong Jing et al.0.00
- Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web
Data (2022)Anurag Kumar et al.β
- Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional
Neural Networks (2022)Jen-Cheng Hou et al.β
- Neural Style Transfer for Audio Spectograms (2024)Prateek Verma et al.β
- A Lightweight Music Texture Transfer System (2021)Xutan Peng et al.β
- Scene-Aware Audio Rendering via Deep Acoustic Analysis (2021)Zhenyu Tang et al.β
- Time-Domain Audio Source Separation Based on Wave-U-Net Combined with
Discrete Wavelet Transform (2022)Tomohiko Nakamura and Hiroshi Saruwatariβ
- ASMD: an automatic framework for compiling multimodal datasets with
audio and scores (2021)Federico Simonetta et al.β
- MDCNN-SID: Multi-scale Dilated Convolution Network for Singer
Identification (2022)Xulong Zhang et al.β
- End-to-End Lip Synchronisation Based on Pattern Classification (2021)You Jin Kim et al.β
- Towards Multimodal MIR: Predicting individual differences from
music-induced movement (2021)Yudhik Agrawal et al.β
- MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers (2021)Yilun Zhao et al.β
- Music SketchNet: Controllable Music Generation via Factorized
Representations of Pitch and Rhythm (2021)Ke Chen et al.β
- Speech Driven Talking Face Generation from a Single Image and an Emotion
Condition (2021)Sefik Emre Eskimez et al.β
- Detection of AI-Synthesized Speech Using Cepstral & Bispectral
Statistics (2021)Arun Kumar Singh (1) et al.β
- Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal
Event Detection from audio stream (2021)Haoran Wei et al.β
- GSEP: A robust vocal and accompaniment separation system using gated
CBHG module and loudness normalization (2021)Soochul Park and Ben Sangbae Chonβ
- Melody Harmonization Using Orderless NADE, Chord Balancing, and Blocked
Gibbs Sampling (2021)Chung-En Sun et al.β
- MusicTM-Dataset for Joint Representation Learning among Sheet Music,
Lyrics, and Musical Audio (2021)Donghuo Zeng et al.β
- Multi-Classifier Interactive Learning for Ambiguous Speech Emotion
Recognition (2023)Ying Zhou et al.β
- Piano Skills Assessment (2021)Paritosh Parmar et al.β
- Melon Playlist Dataset: a public dataset for audio-based playlist
generation and music tagging (2021)Andres Ferraro et al.β
- Neural Network architectures to classify emotions in Indian Classical
Music (2021)Uddalok Sarkar et al.β
- Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks (2021)Bruno Di Giorgi et al.β
- Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based
Approach (2021)Gang Min et al.β
- Signal Representations for Synthesizing Audio Textures with Generative
Adversarial Networks (2022)Chitralekha Gupta et al.β
- Audio Transformers (2025)Prateek Verma and Jonathan Bergerβ
- MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer with
One Transformer VAE (2022)Shih-Lun Wu et al.β
- BERT-like Pre-training for Symbolic Piano Music Classification Tasks (2024)Yi-Hui Chou et al.β
- Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with
Generative Adversarial Affective Expression Learning (2024)Uttaran Bhattacharya and Elizabeth Childs and Nicholas Rewkowski and Dinesh Manochaβ
- FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset (2022)Hasam Khalid and Shahroz Tariq and Minha Kim and Simon S. Wooβ
- Multimodal analysis of the predictability of hand-gesture properties (2022)Taras Kucherenko et al.β
- Attention is All You Need? Good Embeddings with Statistics are
enough:Large Scale Audio Understanding without Transformers/ Convolutions/
BERTs/ Mixers/ Attention/ RNNs or .... (2022)Prateek Vermaβ
- Multimodal Approach for Assessing Neuromotor Coordination in
Schizophrenia Using Convolutional Neural Networks (2023)Yashish M. Siriwardena et al.β
- Singer separation for karaoke content generation (2024)Hsuan-Yu Lin et al.β
- SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice
Generation (2022)Rongjie Huang et al.β
- Speech Pattern based Black-box Model Watermarking for Automatic Speech
Recognition (2022)Haozhe Chen et al.β
- Theme Transformer: Symbolic Music Generation with Theme-Conditioned
Transformer (2022)Yi-Jen Shih et al.β
- AVA-AVD: Audio-Visual Speaker Diarization in the Wild (2022)Eric Zhongcong Xu et al.β
- Embedding-based Music Emotion Recognition Using Composite Loss (2023)Naoki Takashima et al.β
- Zero-shot Audio Source Separation through Query-based Learning from
Weakly-labeled Data (2022)Ke Chen et al.β
- Attribute Inference Attack of Speech Emotion Recognition in Federated
Learning Settings (2022)Tiantian Feng and Hanieh Hashemi and Rajat Hebbar and Murali Annavaram and Shrikanth S. Narayananβ