Deep Speaker Verification: Do We Need End To End?
2017 Β· Dong Wang, Lantian Li, Zhiyuan Tang, et al.
Abstract
End-to-end learning treats the entire system as a whole adaptable black box, which, if sufficient data are available, may learn a system that works very well for the target task. This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric. An opposite approach to end-to-end is feature learning, which firstly trains a feature learning model, and then constructs a back-end classifier separately to perform SV. Recently, both approaches achieved significant performance gains on SV, mainly attributed to the smart utilization of deep neural networks. However, the two approaches have not been carefully compared, and their respective advantages have not been well discussed. In this paper, we compare the end-to-end and feature learning approaches on a text-independent SV task. Our experiments on a dataset sampled from th
Authors
(none)
Tags
Stats
Related papers
- Joint Speaker Encoder And Neural Back-end Model For Fully End-to-end Automatic Speaker Verification With Multiple Enrollment Utterances (2022)0.00
- End-to-end Attention Based Text-dependent Speaker Verification (2017)14.87
- Speaker Verification In Multi-speaker Environments Using Temporal Feature Fusion (2022)0.00
- Cross-lingual Speaker Verification With Deep Feature Learning (2017)8.35
- How To Leverage Dnn-based Speech Enhancement For Multi-channel Speaker Verification? (2022)0.00
- End-to-end Trainable Self-attentive Shallow Network For Text-independent Speaker Verification (2020)0.00
- Deep Speaker Feature Learning For Text-independent Speaker Verification (2017)12.54
- A Unified Deep Learning Framework For Short-duration Speaker Verification In Adverse Environments (2020)9.41