Gesture2music: A Low-latency Real-time Framework For Continuous Gesture-driven Music Generation
2026 Β· Rathinaraja Jeyaraj, Barathi Subramanian, Kapilya Gangadharan, et al.
Abstract
arXiv:2511.00793v2 Announce Type: replace-cross Abstract: Gesture-driven music generation is an emerging human-computer interaction paradigm for touch-free and expressive musical interaction. However, many existing approaches treat the task as isolated gesture classification or map gestures to symbolic outputs such as MIDI followed by a separate rendering stage, which limits temporal continuity and real-time responsiveness. This work presents Gesture2Music, a low-latency streaming framework for continuous gesture-driven music generation from live webcam feed. The system processes sequences of body and hand landmarks and uses a causal temporal convolutional network (TCN) to predict note-level musical control events, including pitch, octave, onset, sustain, amplitude, and activity state. Because available gesture-note datasets typically contain only isolated single-note recordings rather than continuous performance sequences, a synthetic stream generation strategy is introduced to const
Authors
(none)
Tags
Stats
Related papers
- Gelina: Unified Speech And Gesture Synthesis Via Interleaved Token Prediction (2026)0.00
- MUSIC: Learning Muscle-driven Dexterous Hand Control (2026)0.00
- Motionrag-diff: A Retrieval-augmented Diffusion Framework For Long-term Music-to-dance Generation (2025)0.00
- Audio Is All In One: Speech-driven Gesture Synthetics Using Wavlm Pre-trained Model (2023)0.00
- Emotiongesture: Audio-driven Diverse Emotional Co-speech 3D Gesture Generation (2023)10.97
- Neural Music Synthesis For Flexible Timbre Control (2018)9.92
- A Conversational Gesture Synthesis System Based On Emotions And Semantics (2025)0.00
- Diffrhythm: Blazingly Fast And Embarrassingly Simple End-to-end Full-length Song Generation With Latent Diffusion (2025)0.00