Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors
2018 Β· Yansen Wang, Ying Shen, Zhun Liu, et al.
Abstract
Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model ac
Authors
(none)
Tags
Stats
Related papers
- Word Recognition, Competition, And Activation In A Model Of Visually Grounded Speech (2019)0.00
- Predict-and-update Network: Audio-visual Speech Recognition Inspired By Human Speech Perception (2022)6.34
- Visual Gesture Variability Between Talkers In Continuous Visual Speech (2017)0.00
- Neural Representations For Modeling Variation In Speech (2020)0.00
- RWEN-TTS: Relation-aware Word Encoding Network For Natural Text-to-speech Synthesis (2022)0.00
- How To Teach Dnns To Pay Attention To The Visual Modality In Speech Recognition (2020)10.97
- Mingling Or Misalignment? Temporal Shift For Speech Emotion Recognition With Pre-trained Representations (2023)13.84
- Dynamic Time-alignment Of Dimensional Annotations Of Emotion Using Recurrent Neural Networks (2022)0.00