Ego-4D
Emerging9papers using it
2024first seen
Ego4D is a large-scale, wearable-egocentric dataset that contains video recordings of everyday activities, used to evaluate proactive procedural assistance systems and their ability to handle deviations from expected task sequences.
Papers using Ego-4D (9)
- Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural AssistanceRecurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task ProgressEgoSound: Benchmarking Sound Understanding in Egocentric VideosEVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric OptimizationKnow-show: Benchmarking Video-language Models On Spatio-temporal Grounded ReasoningVision and Intention Boost Large Language Model in Long-Term Action
AnticipationHourVideo: 1-Hour Video-Language UnderstandingX-MIC: Cross-Modal Instance Conditioning for Egocentric Action
GeneralizationVLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought