Context And System Fusion In Post-asr Emotion Recognition With Large Language Models
2024 Β· Pavel Stepachev, Pinzhen Chen, Barry Haddow
Abstract
Large language models (LLMs) have started to play a vital role in modelling speech and text. To explore the best use of context and multiple systems' outputs for post-ASR speech emotion prediction, we study LLM prompting on a recent task named GenSEC. Our techniques include ASR transcript ranking, variable conversation context, and system output fusion. We show that the conversation context has diminishing returns and the metric used to select the transcript for prediction is crucial. Finally, our best submission surpasses the provided baseline by 20% in absolute accuracy.
Authors
(none)
Tags
Stats
Related papers
- Revise, Reason, And Recognize: Llm-based Emotion Recognition Via Emotion-specific Prompts And ASR Error Correction (2024)7.81
- Multilingual And Fully Non-autoregressive ASR With Large Language Model Fusion: A Comprehensive Study (2024)0.00
- Towards Interfacing Large Language Models With ASR Systems Using Confidence Measures And Prompting (2024)7.16
- Large Language Model Based Generative Error Correction: A Challenge And Baselines For Speech Recognition, Speaker Tagging, And Emotion Recognition (2024)7.81
- Exploring The Integration Of Large Language Models Into Automatic Speech Recognition Systems: An Empirical Study (2023)8.09
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- Seed-asr: Understanding Diverse Speech And Contexts With Llm-based Speech Recognition (2024)0.00