Gense: Generative Speech Enhancement Via Language Models Using Hierarchical Modeling
2025 Β· Jixun Yao, Hexin Liu, Chen Chen, et al.
Abstract
Semantic information refers to the meaning conveyed through words, phrases, and contextual relationships within a given linguistic structure. Humans can leverage semantic information, such as familiar linguistic patterns and contextual cues, to reconstruct incomplete or masked speech signals in noisy environments. However, existing speech enhancement (SE) approaches often overlook the rich semantic information embedded in speech, which is crucial for improving intelligibility, speaker consistency, and overall quality of enhanced speech signals. To enrich the SE model with semantic information, we employ language models as an efficient semantic learner and propose a comprehensive framework tailored for language model-based speech enhancement, called \textit\{GenSE\}. Specifically, we approach SE as a conditional language modeling task rather than a continuous signal regression problem defined in existing works. This is achieved by tokenizing speech signals into semantic tokens using a p
Authors
(none)
Tags
Stats
Related papers
- Sense: Semantic-aware High-fidelity Universal Speech Enhancement (2025)3.85
- SELM: Speech Enhancement Using Discrete Tokens And Language Models (2023)11.19
- Incorporating Symbolic Sequential Modeling For Speech Enhancement (2019)0.00
- Hierarchical Multi-grained Generative Model For Expressive Speech Synthesis (2020)8.60
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Modeling Speech Recognition And Synthesis Simultaneously: Encoding And Decoding Lexical And Sublexical Semantic Information Into Speech With No Direct Access To Speech Data (2022)4.52
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- Aligning Generative Speech Enhancement With Perceptual Feedback (2025)0.00