MM-ALT: A Multimodal Automatic Lyric Transcription System
2022 Β· Xiangming Gu, Longshen Ou, Danielle Ong, et al.
Abstract
Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest from both the speech and music information retrieval communities, given its significant application potential. However, ALT with audio data alone is a notoriously difficult task due to instrumental accompaniment and musical constraints resulting in degradation of both the phonetic cues and the intelligibility of sung lyrics. To tackle this challenge, we propose the MultiModal Automatic Lyric Transcription system (MM-ALT), together with a new dataset, N20EM, which consists of audio recordings, videos of lip movements, and inertial measurement unit (IMU) data of an earbud worn by the performing singer. We first adapt the wav2vec 2.0 framework from automatic speech recognition (ASR) to the ALT task. We then propose a video-based ALT method and an IMU-based voice activity detection (VAD) method. In addition, we put forward the Residual Cross Attention (RCA) mechanism to fuse data from the three m
Authors
(none)
Tags
Stats
Related papers
- Mstre-net: Multistreaming Acoustic Modeling For Automatic Lyrics Transcription (2021)0.00
- Towards Building An End-to-end Multilingual Automatic Lyrics Transcription Model (2024)0.00
- Pdaugment: Data Augmentation By Pitch And Duration Adjustments For Automatic Lyrics Transcription (2021)0.00
- Deep Audio-visual Singing Voice Transcription Based On Self-supervised Learning Models (2023)0.00
- Automatic Lyrics Transcription Using Dilated Convolutional Neural Networks With Self-attention (2020)10.07
- HCLAS-X: Hierarchical And Cascaded Lyrics Alignment System Using Multimodal Cross-correlation (2023)0.00
- Yourmt3+: Multi-instrument Music Transcription With Enhanced Transformer Architectures And Cross-dataset Stem Augmentation (2024)11.84
- Acoustic Modeling For Automatic Lyrics-to-audio Alignment (2019)8.60