Large Generative Model-assisted Talking-face Semantic Communication System
2024 Β· Feibo Jiang, Siwei Tu, Li Dong, et al.
Abstract
The rapid development of generative Artificial Intelligence (AI) continually unveils the potential of Semantic Communication (SemCom). However, current talking-face SemCom systems still encounter challenges such as low bandwidth utilization, semantic ambiguity, and diminished Quality of Experience (QoE). This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) System tailored for the talking-face video communication. Firstly, we introduce a Generative Semantic Extractor (GSE) at the transmitter based on the FunASR model to convert semantically sparse talking-face videos into texts with high information density. Secondly, we establish a private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction, complemented by a joint knowledge base-semantic-channel coding scheme. Finally, at the receiver, we propose a Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to
Authors
(none)
Tags
Stats
Related papers
- A Unified Compression Framework For Efficient Speech-driven Talking-face Generation (2023)0.00
- Paralinguistics-enhanced Large Language Modeling Of Spoken Dialogue (2023)0.00
- Speechgpt: Empowering Large Language Models With Intrinsic Cross-modal Conversational Abilities (2023)16.59
- Gense: Generative Speech Enhancement Via Language Models Using Hierarchical Modeling (2025)0.00
- See The Speaker: Crafting High-resolution Talking Faces From Speech With Prior Guidance And Region Refinement (2025)0.00
- Semantically Consistent Video-to-audio Generation Using Multimodal Language Large Model (2024)0.00
- Transface: Unit-based Audio-visual Speech Synthesizer For Talking Head Translation (2023)7.16
- Speech Driven Talking Face Generation From A Single Image And An Emotion Condition (2020)0.00