Learning Robust Heterogeneous Signal Features From Parallel Neural Network For Audio Sentiment Analysis
2018 Β· Feiyang Chen, Ziqian Luo
Abstract
Audio Sentiment Analysis is a popular research area which extends the conventional text-based sentiment analysis to depend on the effectiveness of acoustic features extracted from speech. However, current progress on audio sentiment analysis mainly focuses on extracting homogeneous acoustic features or doesn't fuse heterogeneous features effectively. In this paper, we propose an utterance-based deep neural network model, which has a parallel combination of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based network, to obtain representative features termed Audio Sentiment Vector (ASV), that can maximally reflect sentiment information in an audio. Specifically, our model is trained by utterance-level labels and ASV can be extracted and fused creatively from two branches. In the CNN model branch, spectrum graphs produced by signals are fed as inputs while in the LSTM model branch, inputs include spectral features and cepstrum coefficient extracted from dependent ut
Authors
(none)
Tags
Stats
Related papers
- Enhancing Unsupervised Audio Representation Learning Via Adversarial Sample Generation (2023)0.00
- An Empirical Study Of Visual Features For DNN Based Audio-visual Speech Enhancement In Multi-talker Environments (2020)3.58
- Audio-visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2017)17.39
- Video-based Cross-modal Auxiliary Network For Multimodal Sentiment Analysis (2022)11.76
- Machine Learning Framework For Audio-based Content Evaluation Using MFCC, Chroma, Spectral Contrast, And Temporal Feature Engineering (2024)0.00
- Lstmse-net: Long Short Term Speech Enhancement Network For Audio-visual Speech Enhancement (2024)8.57
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23
- AMFFCN: Attentional Multi-layer Feature Fusion Convolution Network For Audio-visual Speech Enhancement (2021)0.00