Phase Recovery With Bregman Divergences For Audio Source Separation
2020 Β· Paul Magron, Pierre-Hugo Vial, Thomas Oberlin, et al.
Abstract
Time-frequency audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a phase recovery algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has shown good performance in several recent works. This algorithm minimizes a quadratic reconstruction error between magnitude spectrograms. However, this loss does not properly account for some perceptual properties of audio, and alternative discrepancy measures such as beta-divergences have been preferred in many settings. In this paper, we propose to reformulate phase recovery in audio source separation as a minimization problem involving Bregman divergences. To optimize the resulting objective, we derive a projected gradient descent algorithm. Experiments conducted on a speech enhancement task show that this approach outperforms MISI for several alternative losses, which highlights their relevanc
Authors
(none)
Tags
Stats
Related papers
- Spectrogram Inversion For Audio Source Separation Via Consistency, Mixing, And Magnitude Constraints (2023)0.00
- Deep Learning Based Phase Reconstruction For Speaker Separation: A Trigonometric Perspective (2018)13.34
- Sparse Gaussian Process Audio Source Separation Using Spectrum Priors In The Time-domain (2018)5.84
- Complex NMF Under Phase Constraints Based On Signal Modeling: Application To Audio Source Separation (2016)7.50
- End-to-end Speech Separation With Unfolded Iterative Phase Reconstruction (2018)15.00
- Data-driven Source Separation Based On Simplex Analysis (2018)0.00
- An Explicit Consistency-preserving Loss Function For Phase Reconstruction And Speech Enhancement (2024)2.26
- Diffphase: Generative Diffusion-based STFT Phase Retrieval (2022)8.82