D2former: A Fully Complex Dual-path Dual-decoder Conformer Network Using Joint Complex Masking And Complex Spectral Mapping For Monaural Speech Enhancement

Abstract

Monaural speech enhancement has been widely studied using real networks in the time-frequency (TF) domain. However, the input and the target are naturally complex-valued in the TF domain, a fully complex network is highly desirable for effectively learning the feature representation and modelling the sequence in the complex domain. Moreover, phase, an important factor for perceptual quality of speech, has been proved learnable together with magnitude from noisy speech using complex masking or complex spectral mapping. Many recent studies focus on either complex masking or complex spectral mapping, ignoring their performance boundaries. To address above issues, we propose a fully complex dual-path dual-decoder conformer network (D2Former) using joint complex masking and complex spectral mapping for monaural speech enhancement. In D2Former, we extend the conformer network into the complex domain and form a dual-path complex TF self-attention architecture for effectively modelling the com

D2former: A Fully Complex Dual-path Dual-decoder Conformer Network Using Joint Complex Masking And Complex Spectral Mapping For Monaural Speech Enhancement

Abstract

Authors

Tags

Stats

Related papers