Temporal Film: Capturing Long-range Sequence Dependencies With Feature-wise Modulations
2019 Β· Sawyer Birnbaum, Volodymyr Kuleshov, Zayd Enam, et al.
Abstract
Learning representations that accurately capture long-range dependencies in sequential inputs -- including text, audio, and genomic data -- is a key problem in deep learning. Feed-forward convolutional models capture only feature interactions within finite receptive fields while recurrent architectures can be slow and difficult to train due to vanishing gradients. Here, we propose Temporal Feature-Wise Linear Modulation (TFiLM) -- a novel architectural component inspired by adaptive batch normalization and its extensions -- that uses a recurrent neural network to alter the activations of a convolutional model. This approach expands the receptive field of convolutional sequence models with minimal computational overhead. Empirically, we find that TFiLM significantly improves the learning speed and accuracy of feed-forward neural networks on a range of generative and discriminative learning tasks, including text classification and audio super-resolution
Authors
(none)
Tags
Stats
Related papers
- Temporal Working Memory: Query-guided Segment Refinement For Enhanced Multimodal Understanding (2025)11.33
- Language Modeling With Neural Trans-dimensional Random Fields (2017)4.52
- Improved Neural Language Model Fusion For Streaming Recurrent Neural Network Transducer (2020)8.82
- Capturing Long-term Temporal Dependencies With Convolutional Networks For Continuous Emotion Recognition (2017)10.48
- Multi-resolution Audio-visual Feature Fusion For Temporal Action Localization (2023)0.00
- Learning Multiscale Features Directly From Waveforms (2016)0.00
- Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks (2020)5.84
- Speaking From Coarse To Fine: Improving Neural Codec Language Model Via Multi-scale Speech Coding And Generation (2024)3.58