DSVAE: Interpretable Disentangled Representation For Synthetic Speech Detection
2023 Β· Amit Kumar Singh Yadav, Kratika Bhagtani, Ziyue Xiang, et al.
Abstract
Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these approaches use deep learning methods as a black box without providing reasoning for the decisions they make. This limits the interpretability of these approaches. In this paper, we propose Disentangled Spectrogram Variational Auto Encoder (DSVAE) which is a two staged trained variational autoencoder that processes spectrograms of speech using disentangled representation learning to generate interpretable representations of a speech signal for detecting synthetic speech. DSVAE also creates an activation map to highlight the spectrogram regions that discriminate synthetic and bona fide human speech signals. We evaluated the representations obtained from DSVAE using the ASVspoof2019 dataset. Our experimental results show high accuracy (>98%) on detecti
Authors
(none)
Tags
Stats
Related papers
- Deep Generative Variational Autoencoding For Replay Spoof Detection In Automatic Speaker Verification (2020)9.76
- Learning Disentangled Speech Representations (2023)0.00
- Representation Selective Self-distillation And Wav2vec 2.0 Feature Exploration For Spoof-aware Speaker Verification (2022)9.03
- Toward Improving Synthetic Audio Spoofing Detection Robustness Via Meta-learning And Disentangled Training With Adversarial Examples (2024)6.77
- A Benchmark Of Dynamical Variational Autoencoders Applied To Speech Spectrogram Modeling (2021)6.77
- A Statistically Principled And Computationally Efficient Approach To Speech Enhancement Using Variational Autoencoders (2019)9.23
- Deep Residual Neural Networks For Audio Spoofing Detection (2019)0.00
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48