ActivityNet-Caption

Name: ActivityNet-Caption
License: other

Emerging

6papers using it

169HF downloads

3HF likes

2017first seen

The ActivityNet Captions dataset connects videos to a series of temporally annotated sentence descriptions. Each sentence covers an unique segment of the video, describing multiple events that occur. These events may occur over very long or short periods of time and are not limited in any capacity, allowing them to co-occur. On average, each of the 20k videos contains 3.65 temporally localized sentences, resulting in a total of 100k sentences. We find that the number of sentences per video follows a relatively normal distribution. Furthermore, as the video duration increases, the number of sentences also increases. Each sentence has an average length of 13.48 words, which is also normally distributed. You can find more details of the dataset under the ActivityNet Captions Dataset section, and under supplementary materials in the paper.

🤗 Hugging Face⚖ other

Papers using ActivityNet-Caption (6)

Attend And Interact: Higher-order Object Interactions For Video Understanding2017 · 12 cites

Grounded Objects And Interactions For Video Captioning2017 · 5 cites

SAVCHOI: Detecting Suspicious Activities Using Dense Video Captioning With Human Object Interactions2022 · 3 cites

Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching2021 · 1 cites

Live Video Captioning2024 · 1 cites

ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval2022