Exploring Acoustic Similarity In Emotional Speech And Music Via Self-supervised Representations
2024 Β· Yujia Sun, Zeyu Zhao, Korin Richmond, et al.
Abstract
Emotion recognition from speech and music shares similarities due to their acoustic overlap, which has led to interest in transferring knowledge between these domains. However, the shared acoustic cues between speech and music, particularly those encoded by Self-Supervised Learning (SSL) models, remain largely unexplored, given the fact that SSL models for speech and music have rarely been applied in cross-domain research. In this work, we revisit the acoustic similarity between emotion speech and music, starting with an analysis of the layerwise behavior of SSL models for Speech Emotion Recognition (SER) and Music Emotion Recognition (MER). Furthermore, we perform cross-domain adaptation by comparing several approaches in a two-stage fine-tuning process, examining effective ways to utilize music for SER and speech for MER. Lastly, we explore the acoustic similarities between emotional speech and music using Frechet audio distance for individual emotions, uncovering the issue of emotio
Authors
(none)
Tags
Stats
Related papers
- Cross-lingual Speech Emotion Recognition: Humans Vs. Self-supervised Models (2024)5.84
- Leveraging Semantic Information For Efficient Self-supervised Emotion Recognition With Audio-textual Distilled Models (2023)6.34
- Exploring Self-supervised Multi-view Contrastive Learning For Speech Emotion Recognition With Limited Annotations (2024)3.58
- Improving Self-supervised Learning For Audio Representations By Feature Diversity And Decorrelation (2023)0.00
- Comparing Self-supervised Learning Models Pre-trained On Human Speech And Animal Vocalizations For Bioacoustics Processing (2025)5.24
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)13.74
- Layer-wise Analysis Of Self-supervised Acoustic Word Embeddings: A Study On Speech Emotion Recognition (2024)0.00