Lightweight Dual-channel Target Speaker Separation For Mobile Voice Communication
2021 Β· Yuanyuan Bao, Yanze Xu, Na Xu, et al.
Abstract
Nowadays, there is a strong need to deploy the target speaker separation (TSS) model on mobile devices with a limitation of the model size and computational complexity. To better perform TSS for mobile voice communication, we first make a dual-channel dataset based on a specific scenario, LibriPhone. Specifically, to better mimic the real-case scenario, instead of simulating from the single-channel dataset, LibriPhone is made by simultaneously replaying pairs of utterances from LibriSpeech by two professional artificial heads and recording by two built-in microphones of the mobile. Then, we propose a lightweight time-frequency domain separation model, LSTM-Former, which is based on the LSTM framework with source-to-noise ratio (SI-SNR) loss. For the experiments on Libri-Phone, we explore the dual-channel LSTMFormer model and a single-channel version by a random single channel of Libri-Phone. Experimental result shows that the dual-channel LSTM-Former outperforms the single-channel LSTM
Authors
(none)
Tags
Stats
Related papers
- Voicefilter-lite: Streaming Targeted Voice Separation For On-device Speech Recognition (2020)12.68
- Speaker-conditioned Target Speaker Extraction Based On Customized LSTM Cells (2021)0.00
- Tensor-train Long Short-term Memory For Monaural Speech Enhancement (2018)0.00
- 3S-TSE: Efficient Three-stage Target Speaker Extraction For Real-time And Low-resource Applications (2023)5.24
- Libri2vox Dataset: Target Speaker Extraction With Diverse Speaker Conditions And Synthetic Data (2024)0.00
- Conformer-based Target-speaker Automatic Speech Recognition For Single-channel Audio (2023)9.41
- Individualized Conditioning And Negative Distances For Speaker Separation (2022)2.26
- Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation In Complex Domain (2021)11.85