Multi-view Multi-task Modeling With Speech Foundation Models For Speech Forensic Tasks
2024 Β· Orchid Chetia Phukan, Devyani Koshal, Swarup Ranjan Behera, et al.
Abstract
Speech forensic tasks (SFTs), such as automatic speaker recognition (ASR), speech emotion recognition (SER), gender recognition (GR), and age estimation (AE), find use in different security and biometric applications. Previous works have applied various techniques, with recent studies focusing on applying speech foundation models (SFMs) for improved performance. However, most prior efforts have centered on building individual models for each task separately, despite the inherent similarities among these tasks. This isolated approach results in higher computational resource requirements, increased costs, time consumption, and maintenance challenges. In this study, we address these challenges by employing a multi-task learning strategy. Firstly, we explore the various state-of-the-art (SOTA) SFMs by extracting their representations for learning these SFTs and investigating their effectiveness at each task specifically. Secondly, we analyze the performance of the extracted representations
Authors
(none)
Tags
Stats
Related papers
- Resource-efficient Adaptation Of Speech Foundation Models For Multi-speaker ASR (2024)3.58
- Adapting Speech Foundation Models For Unified Multimodal Speech Recognition With Large Language Models (2025)0.00
- TIMIT Speaker Profiling: A Comparison Of Multi-task Learning And Single-task Learning Approaches (2024)0.00
- Audio-visual Representation Learning Via Knowledge Distillation From Speech Foundation Models (2025)7.81
- A Comparative Study On Multichannel Speaker-attributed Automatic Speech Recognition In Multi-party Meetings (2022)5.24
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92