Improved Lite Audio-visual Speech Enhancement
2020 Β· Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao
Abstract
Numerous studies have investigated the effectiveness of audio-visual multimodal learning for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary and complementary input to reduce the noise of noisy speech signals. Recently, we proposed a lite audio-visual speech enhancement (LAVSE) algorithm for a car-driving scenario. Compared to conventional AVSE systems, LAVSE requires less online computation and to some extent solves the user privacy problem on facial data. In this study, we extend LAVSE to improve its ability to address three practical issues often encountered in implementing AVSE systems, namely, the additional cost of processing visual data, audio-visual asynchronization, and low-quality visual data. The proposed system is termed improved LAVSE (iLAVSE), which uses a convolutional recurrent neural network architecture as the core AVSE model. We evaluate iLAVSE on the Taiwan Mandarin speech with video dataset. Experimental results confirm that c
Authors
(none)
Tags
Stats
Related papers
- Lstmse-net: Long Short Term Speech Enhancement Network For Audio-visual Speech Enhancement (2024)8.57
- Audio-visual Speech Separation In Noisy Environments With A Lightweight Iterative Model (2023)0.00
- Audio-visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2017)17.39
- Incorporating Ultrasound Tongue Images For Audio-visual Speech Enhancement (2023)0.00
- Improving Audio-visual Speech Recognition By Lip-subword Correlation Based Visual Pre-training And Cross-modal Fusion Encoder (2023)6.34
- XLAVS-R: Cross-lingual Audio-visual Speech Representation Learning For Noise-robust Speech Perception (2024)7.50
- La-voce: Low-snr Audio-visual Speech Enhancement Using Neural Vocoders (2022)0.00
- Multi-layer Feature Fusion Convolution Network For Audio-visual Speech Enhancement (2021)0.00