Representation Learning Via Global Temporal Alignment And Cycle-consistency
2021 Β· Isma Hadji, Konstantinos G. Derpanis, Allan D. Jepson
Abstract
We introduce a weakly supervised method for representation learning based on aligning temporal sequences (e.g., videos) of the same process (e.g., human action). The main idea is to use the global temporal ordering of latent correspondences across sequence pairs as a supervisory signal. In particular, we propose a loss based on scoring the optimal sequence alignment to train an embedding network. Our loss is based on a novel probabilistic path finding view of dynamic time warping (DTW) that contains the following three key features: (i) the local path routing decisions are contrastive and differentiable, (ii) pairwise distances are cast as probabilities that are contrastive as well, and (iii) our formulation naturally admits a global cycle consistency loss that verifies correspondences. For evaluation, we consider the tasks of fine-grained action classification, few shot learning, and video synchronization. We report significant performance increases over previous methods. In addition,
Authors
(none)
Tags
Stats
Related papers
- TCLR: Temporal Contrastive Learning For Video Representation (2021)15.78
- Object-centric Representation Learning From Unlabeled Videos (2016)7.16
- Cycle-contrast For Self-supervised Video Representation Learning (2020)0.00
- Lat: Latent Translation With Cycle-consistency For Video-text Retrieval (2022)0.00
- A Multi-level Alignment Training Scheme For Video-and-language Grounding (2022)3.58
- Video-language Alignment Via Spatio-temporal Graph Transformer (2024)0.00
- T2VLAD: Global-local Sequence Alignment For Text-video Retrieval (2021)16.65
- Self-supervised Video Representation Learning With Cross-stream Prototypical Contrasting (2021)8.82