Non-linear Frequency Warping Using Constant-q Transformation For Speech Emotion Recognition
2021 Β· Premjeet Singh, Goutam Saha, Md Sahidullah
Abstract
In this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a comparative analysis of short-term acoustic features based on STFT and CQT for SER with deep neural network (DNN) as a back-end classifier. We optimize different parameters for both features. The CQT-based features outperform the STFT-based spectral features for SER experiments. Further experiments with cross-corpora evaluation demonstrate that the CQT-based systems provide better generalization with out-of-domain training data.
Authors
(none)
Tags
Stats
Related papers
- Analysis Of Constant-q Filterbank Based Representations For Speech Emotion Recognition (2022)7.81
- Speech Emotion Recognition Using Quaternion Convolutional Neural Networks (2021)12.74
- Pitch-synchronous Single Frequency Filtering Spectrogram For Speech Emotion Recognition (2019)11.19
- Speech Emotion Recognition Using Deep Sparse Auto-encoder Extreme Learning Machine With A New Weighting Scheme And Spectro-temporal Features Along With Classical Feature Selection And A New Quantum-inspired Dimension Reduction Method (2021)0.00
- Speecheq: Speech Emotion Recognition Based On Multi-scale Unified Datasets And Multitask Learning (2022)5.84
- SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition With Speaker Embedding And Vision Transformers (2022)2.83
- Enhanced Speech Emotion Recognition With Efficient Channel Attention Guided Deep Cnn-bilstm Framework (2024)0.00
- Sigwavnet: Learning Multiresolution Signal Wavelet Network For Speech Emotion Recognition (2025)8.48