Automatic Quality Assessment For Audio-visual Verification Systems. The Love Submission To NIST SRE Challenge 2019
2020 Β· Grigory Antipov, Nicolas Gengembre, Olivier Le Blouch, et al.
Abstract
Fusion of scores is a cornerstone of multimodal biometric systems composed of independent unimodal parts. In this work, we focus on quality-dependent fusion for speaker-face verification. To this end, we propose a universal model which can be trained for automatic quality assessment of both face and speaker modalities. This model estimates the quality of representations produced by unimodal systems which are then used to enhance the score-level fusion of speaker and face verification modules. We demonstrate the improvements brought by this quality-dependent fusion on the recent NIST SRE19 Audio-Visual Challenge dataset.
Authors
(none)
Tags
Stats
Related papers
- HLT-NUS Submission For NIST 2019 Multimedia Speaker Recognition Evaluation (2020)0.00
- Attention-based Audio-visual Fusion For Robust Automatic Speech Recognition (2018)16.67
- Comparative Analysis Of Modality Fusion Approaches For Audio-visual Person Identification And Verification (2024)0.00
- Active Speaker Detection As A Multi-objective Optimization With Uncertainty-based Multimodal Fusion (2021)7.50
- Robust Audio-visual Target Speaker Extraction With Emotion-aware Multiple Enrollment Fusion (2025)0.00
- Attentive Fusion Enhanced Audio-visual Encoding For Transformer Based Robust Speech Recognition (2020)0.00
- Joint Optimization Of Speaker And Spoof Detectors For Spoofing-robust Automatic Speaker Verification (2025)0.00
- Quality Measures For Speaker Verification With Short Utterances (2019)0.00