Something-Something
Canonical10papers using it
242HF downloads
21HF likes
2019first seen
The Something-Something dataset (version 2) is a collection of 220,847 labeled video clips of humans performing pre-defined, basic actions with everyday objects. It is designed to train machine learning models in fine-grained understanding of human hand gestures like putting something into something, turning something upside down and covering something with something.
π€ Hugging Faceβ other
Papers using Something-Something (9)
- Few-shot Video Classification Via Temporal AlignmentTemporal-relational Crosstransformers For Few-shot Action RecognitionCo-training Transformer With Videos And Images Improves Action RecognitionMotion-guided Masking For Spatiotemporal Representation LearningMotion Guided Attention Fusion To Recognize Interactions From VideosOn The Surprising Effectiveness Of Transformers In Low-labeled Video RecognitionInsights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze TransformerSVT: Supertoken Video Transformer For Efficient Video UnderstandingViViT: A Video Vision Transformer