Uncategorized
50 papers tagged Uncategorized (ordered by heat_score)
Papers
- Robust Wav2vec 2.0: Analyzing Domain Shift In Self-supervised Pre-training (2021)Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, et al.25.07
- Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions (2017)Jonathan Shen, Ruoming Pang, Ron J. Weiss, et al.24.07
- Diffsinger: Singing Voice Synthesis Via Shallow Diffusion Mechanism (2021)Jinglin Liu, Chengxi Li, Yi Ren, et al.23.76
- Voxceleb: A Large-scale Speaker Identification Dataset (2017)Arsha Nagrani, Joon Son Chung, Andrew Zisserman23.55
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck23.07
- Efficient Training Of Audio Transformers With Patchout (2021)Khaled Koutini, Jan Schlüter, Hamid Eghbal-Zadeh, et al.22.11
- End-to-end Neural Speaker Diarization With Permutation-free Objectives (2019)Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, et al.21.98
- Is Someone Speaking? Exploring Long-term Temporal Features For Audio-visual Active Speaker Detection (2021)Ruijie Tao, Zexu Pan, Rohan Kumar Das, et al.21.12
- Waveglow: A Flow-based Generative Network For Speech Synthesis (2018)Ryan Prenger, Rafael Valle, Bryan Catanzaro20.65
- Generalized End-to-end Loss For Speaker Verification (2017)Li Wan, Quan Wang, Alan Papir, et al.20.58
- An Enhanced Res2net With Local And Global Feature Fusion For Speaker Verification (2023)Yafeng Chen, Siqi Zheng, Hui Wang, et al.19.74
- Convolutional Recurrent Neural Networks For Music Classification (2016)Keunwoo Choi, George Fazekas, Mark Sandler, et al.18.98
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)Koji Okabe, Takafumi Koshinaka, Koichi Shinoda18.88
- Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors (2018)Yansen Wang, Ying Shen, Zhun Liu, et al.18.79
- Broadcasted Residual Learning For Efficient Keyword Spotting (2021)Byeonggeun Kim, Simyung Chang, Jinkyu Lee, et al.18.60
- Convolutional RNN: An Enhanced Model For Extracting Features From Sequential Data (2016)Gil Keren, Björn Schuller18.20
- Stargan-vc: Non-parallel Many-to-many Voice Conversion With Star Generative Adversarial Networks (2018)Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, et al.18.09
- DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards (2026)Kaiyi Zhang et al.17.70
- Efficient Large-scale Audio Tagging Via Transformer-to-cnn Knowledge Distillation (2022)Florian Schmid, Khaled Koutini, Gerhard Widmer17.68
- SSAST: Self-supervised Audio Spectrogram Transformer (2021)Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, et al.17.61
- Lipreading Using Temporal Convolutional Networks (2020)Brais Martinez, Pingchuan Ma, Stavros Petridis, et al.17.61
- Pyannote.audio: Neural Building Blocks For Speaker Diarization (2019)Hervé Bredin, Ruiqing Yin, Juan Manuel Coria, et al.17.58
- Speaker Diarization With LSTM (2017)Quan Wang, Carlton Downey, Li Wan, et al.17.48
- Automatic Speaker Verification Spoofing And Deepfake Detection Using Wav2vec 2.0 And Data Augmentation (2022)Hemlata Tak, Massimiliano Todisco, Xin Wang, et al.17.35
- Unsupervised Speech Representation Learning Using Wavenet Autoencoders (2019)Jan Chorowski, Ron J. Weiss, Samy Bengio, et al.17.21
- Federated Learning For Keyword Spotting (2018)David Leroy, Alice Coucke, Thibaut Lavril, et al.17.09
- Pushing The Limits Of Self-supervised Speaker Verification Using Regularized Distillation Framework (2022)Yafeng Chen, Siqi Zheng, Hui Wang, et al.17.00
- Detection Of Glottal Closure Instants From Speech Signals: A Quantitative Review (2019)Thomas Drugman, Mark Thomas, Jon Gudnason, et al.16.88
- Phone-to-audio Alignment Without Text: A Semi-supervised Approach (2021)Jian Zhu, Cong Zhang, David Jurgens16.74
- Speechgpt: Empowering Large Language Models With Intrinsic Cross-modal Conversational Abilities (2023)Dong Zhang, Shimin Li, Xin Zhang, et al.16.59
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara16.48
- Voice Conversion From Non-parallel Corpora Using Variational Auto-encoder (2016)Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, et al.16.36
- Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks (2017)Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari16.21
- Deep Residual Learning For Small-footprint Keyword Spotting (2017)Raphael Tang, Jimmy Lin16.21
- Target-speaker Voice Activity Detection: A Novel Approach For Multi-speaker Diarization In A Dinner Party Scenario (2020)Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, et al.16.19
- PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, And Aggregation (2021)Yuan Gong, Yu-An Chung, James Glass15.85
- Fully Supervised Speaker Diarization (2018)Aonan Zhang, Quan Wang, Zhenyao Zhu, et al.15.80
- Temporal Convolution For Real-time Keyword Spotting On Mobile Devices (2019)Seungwoo Choi, Seokjun Seo, Beomjun Shin, et al.15.67
- Towards Better Decoding And Language Model Integration In Sequence To Sequence Models (2016)Jan Chorowski, Navdeep Jaitly15.67
- Muq: Self-supervised Music Representation Learning With Mel Residual Vector Quantization (2025)Haina Zhu, Yizhi Zhou, Hangting Chen, et al.15.66
- Quality-net: An End-to-end Non-intrusive Speech Quality Assessment Model Based On BLSTM (2018)Szu-Wei Fu, Yu Tsao, Hsin-Te Hwang, et al.15.62
- CAM++: A Fast And Efficient Network For Speaker Verification Using Context-aware Masking (2023)Hui Wang, Siqi Zheng, Yafeng Chen, et al.15.57
- Audio ALBERT: A Lite BERT For Self-supervised Learning Of Audio Representation (2020)Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, et al.15.54
- Singing Voice Data Scaling-up: An Introduction To Ace-opencpop And Ace-kising (2024)Jiatong Shi, Yueqian Lin, Xinyi Bai, et al.15.48
- Multi-level And Multi-scale Feature Aggregation Using Pre-trained Convolutional Neural Networks For Music Auto-tagging (2017)Jongpil Lee, Juhan Nam15.43
- Mmdenselstm: An Efficient Combination Of Convolutional And Recurrent Neural Networks For Audio Source Separation (2018)Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji15.28
- BYOL For Audio: Self-supervised Learning For General-purpose Audio Representation (2021)Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, et al.15.22
- Automatic Detection Of Depression In Speech Using Ensemble Convolutional Neural Networks (2024)Adrián Vázquez-Romero, Ascensión Gallardo-Antolín15.06
- The Second DIHARD Diarization Challenge: Dataset, Task, And Baselines (2019)Neville Ryant, Kenneth Church, Christopher Cieri, et al.15.00
- A Streaming On-device End-to-end Model Surpassing Server-side Conventional Model Quality And Latency (2020)Tara N. Sainath, Yanzhang He, Bo Li, et al.15.00