Enhancing Speech Emotion Recognition Through Differentiable Architecture Search
2023 Β· Thejan Rajapakshe, Rajib Rana, Sara Khalifa, et al.
Abstract
Speech Emotion Recognition (SER) is a critical enabler of emotion-aware communication in human-computer interactions. Recent advancements in Deep Learning (DL) have substantially enhanced the performance of SER models through increased model complexity. However, designing optimal DL architectures requires prior experience and experimental evaluations. Encouragingly, Neural Architecture Search (NAS) offers a promising avenue to determine an optimal DL model automatically. In particular, Differentiable Architecture Search (DARTS) is an efficient method of using NAS to search for optimised models. This paper proposes a DARTS-optimised joint CNN and LSTM architecture, to improve SER performance, where the literature informs the selection of CNN and LSTM coupling to offer improved performance. While DARTS has previously been applied to CNN and LSTM combinations, our approach introduces a novel mechanism, particularly in selecting CNN operations using DARTS. In contrast to previous studies,
Authors
(none)
Tags
Stats
Related papers
- Improved Conformer-based End-to-end Speech Recognition Using Neural Architecture Search (2021)0.00
- A Breakthrough In Speech Emotion Recognition Using Deep Retinal Convolution Neural Networks (2017)0.00
- DARTS-ASR: Differentiable Architecture Search For Multilingual Speech Recognition And Adaptation (2020)8.60
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Speech Emotion Recognition With Dual-sequence LSTM Architecture (2019)15.78
- Efficient Neural Architecture Search For End-to-end Speech Recognition Via Straight-through Gradients (2020)8.35
- Searching For Effective Preprocessing Method And Cnn-based Architecture With Efficient Channel Attention On Speech Emotion Recognition (2024)2.26
- Hybrid Data Augmentation And Deep Attention-based Dilated Convolutional-recurrent Neural Networks For Speech Emotion Recognition (2021)12.81