Paralinguistics-enhanced Large Language Modeling Of Spoken Dialogue
2023 Β· Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, et al.
Abstract
Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM that utilizes text and speech modalities to better model the linguistic content and paralinguistic attributes of spoken dialogue. The model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking multimodal framework. Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditionin
Authors
(none)
Tags
Stats
Related papers
- Paralinguistics-aware Speech-empowered Large Language Models For Natural Conversation (2024)3.96
- Frozen Large Language Models Can Perceive Paralinguistic Aspects Of Speech (2024)6.34
- Get Large Language Models Ready To Speak: A Late-fusion Approach For Speech Generation (2024)5.24
- Large Language Model Can Transcribe Speech In Multi-talker Scenarios With Versatile Instructions (2024)11.23
- PSLM: Parallel Generation Of Text And Speech With Llms For Low-latency Spoken Dialogue Systems (2024)2.26
- X-LLM: Bootstrapping Advanced Large Language Models By Treating Multi-modalities As Foreign Languages (2023)0.00
- Recent Advances In Speech Language Models: A Survey (2024)14.64
- Prompting Large Language Models With Audio For General-purpose Speech Summarization (2024)6.34