Abstract

Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and loss in performance. Alternatively, cross-domain speaker verification for multiple acoustic environments is a challenging task which could advance research in audio forensics. In this study, we introduce a CRSS-Forensics audio dataset collected in multiple acoustic environments. We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics. Based on this fine-tuned model, we align domain-specific distributions in the e

Authors

(none)

Tags

  • Speaker Analysis

Stats

  • citations13
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score8.60
  • arxiv keywang2020cross

Related papers