Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
2025 Β· Avishai Elmakies, Omri Abend, Yossi Adi
Abstract
In this paper, we introduce an unsupervised approach for Speech Segmentation, which builds on previously researched approaches, e.g., Speaker Diarization, while being applicable to an inclusive set of acoustic-semantic distinctions, paving a path towards a general Unsupervised Speech Segmentation approach. Unlike traditional speech and audio segmentation, which mainly focuses on spectral changes in the input signal, e.g., phone segmentation, our approach tries to segment the spoken utterance into chunks with differing acoustic-semantic styles, focusing on acoustic-semantic information that does not translate well into text, e.g., emotion or speaker. While most Speech Segmentation tasks only handle one style change, e.g., emotion diarization, our approach tries to handle multiple acoustic-semantic style changes. Leveraging recent advances in Speech Language Models (SLMs), we propose a simple unsupervised method to segment a given speech utterance. We empirically demonstrate the effectiv
Authors
(none)
Tags
Stats
Related papers
- Smart Speech Segmentation Using Acousto-linguistic Features With Look-ahead (2022)0.00
- An Embedded Segmental K-means Model For Unsupervised Segmentation And Clustering Of Speech (2017)0.00
- Unsupervised Word Segmentation And Lexicon Discovery Using Acoustic Word Embeddings (2016)12.10
- Disentangling Speech And Non-speech Components For Building Robust Acoustic Models From Found Data (2019)0.00
- Speakerlm: End-to-end Versatile Speaker Diarization And Recognition With Multimodal Large Language Models (2025)5.24
- Unsupervised Speech Recognition Via Segmental Empirical Output Distribution Matching (2018)0.00
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92
- Unislu: Unified Spoken Language Understanding From Heterogeneous Cross-task Datasets (2025)0.00