ANIM-400K: A Large-scale Dataset For Automated End-to-end Dubbing Of Video
2024 Β· Kevin Cai, Chonghua Liu, David M. Chan
Abstract
The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18.8% are English speakers, and just 5.1% consider it their native language, leading to disparities in online information access. Unfortunately, automated processes for dubbing of video - replacing the audio track of a video with a translated alternative - remains a complex and challenging task due to pipelines, necessitating precise timing, facial movement synchronization, and prosody matching. While end-to-end dubbing offers a solution, data scarcity continues to impede the progress of both end-to-end and pipeline-based methods. In this work, we introduce Anim-400K, a comprehensive dataset of over 425K aligned animated video segments in Japanese and English supporting various video-related tasks, including automated dubbing, simultaneous translation, guided video summarization, and genre/theme/style classification. Our dataset is made publicly available for resea
Authors
(none)
Tags
Stats
Related papers
- Dubbing In Practice: A Large Scale Study Of Human Localization With Insights For Automatic Dubbing (2022)8.82
- Large-scale Multilingual Audio Visual Dubbing (2020)0.00
- Funcineforge: A Unified Dataset Toolkit And Model For Zero-shot Movie Dubbing In Diverse Cinematic Scenes (2026)0.00
- Voicecraft-dub: Automated Video Dubbing With Neural Codec Language Models (2025)0.00
- Neural Dubber: Dubbing For Videos According To Scripts (2021)0.00
- Prosody-enhanced Acoustic Pre-training And Acoustic-disentangled Prosody Adapting For Movie Dubbing (2025)3.58
- Auto-acd: A Large-scale Dataset For Audio-language Representation Learning (2023)10.74
- OLKAVS: An Open Large-scale Korean Audio-visual Speech Dataset (2023)4.52