cluster #6
50 papers in this cluster (ordered by heat_score)
Papers
- Efficient Training Of Audio Transformers With Patchout (2021)Khaled Koutini, Jan Schlüter, Hamid Eghbal-Zadeh, et al.22.11
- Is Someone Speaking? Exploring Long-term Temporal Features For Audio-visual Active Speaker Detection (2021)Ruijie Tao, Zexu Pan, Rohan Kumar Das, et al.21.12
- Waveglow: A Flow-based Generative Network For Speech Synthesis (2018)Ryan Prenger, Rafael Valle, Bryan Catanzaro20.65
- Audiolm: A Language Modeling Approach To Audio Generation (2022)Zalán Borsos, Raphaël Marinier, Damien Vincent, et al.18.91
- Convolutional RNN: An Enhanced Model For Extracting Features From Sequential Data (2016)Gil Keren, Björn Schuller18.20
- Deep Learning Enabled Semantic Communications With Speech Recognition And Synthesis (2022)Zhenzi Weng, Zhijin Qin, Xiaoming Tao, et al.17.85
- Efficient Large-scale Audio Tagging Via Transformer-to-cnn Knowledge Distillation (2022)Florian Schmid, Khaled Koutini, Gerhard Widmer17.68
- SSAST: Self-supervised Audio Spectrogram Transformer (2021)Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, et al.17.61
- Lipreading Using Temporal Convolutional Networks (2020)Brais Martinez, Pingchuan Ma, Stavros Petridis, et al.17.61
- Funcodec: A Fundamental, Reproducible And Integrable Open-source Toolkit For Neural Speech Codec (2023)Zhihao Du, Shiliang Zhang, Kai Hu, et al.17.47
- Unsupervised Speech Representation Learning Using Wavenet Autoencoders (2019)Jan Chorowski, Ron J. Weiss, Samy Bengio, et al.17.21
- The Voice Conversion Challenge 2018: Promoting Development Of Parallel And Nonparallel Methods (2018)Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, et al.17.06
- Mosnet: Deep Learning Based Objective Assessment For Voice Conversion (2019)Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, et al.16.90
- Detection Of Glottal Closure Instants From Speech Signals: A Quantitative Review (2019)Thomas Drugman, Mark Thomas, Jon Gudnason, et al.16.88
- Attention-based Audio-visual Fusion For Robust Automatic Speech Recognition (2018)George Sterpu, Christian Saam, Naomi Harte16.67
- Voice Conversion From Non-parallel Corpora Using Variational Auto-encoder (2016)Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, et al.16.36
- Speech Resynthesis From Discrete Disentangled Self-supervised Representations (2021)Adam Polyak, Yossi Adi, Jade Copet, et al.16.25
- PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, And Aggregation (2021)Yuan Gong, Yu-An Chung, James Glass15.85
- Visqol V3: An Open Source Production Ready Objective Speech And Audio Metric (2020)Michael Chinen, Felicia S. C. Lim, Jan Skoglund, et al.15.83
- Quality-net: An End-to-end Non-intrusive Speech Quality Assessment Model Based On BLSTM (2018)Szu-Wei Fu, Yu Tsao, Hsin-Te Hwang, et al.15.62
- BYOL For Audio: Self-supervised Learning For General-purpose Audio Representation (2021)Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, et al.15.22
- Codec Does Matter: Exploring The Semantic Shortcoming Of Codec For Audio Language Model (2024)Zhen Ye, Peiwen Sun, Jiahe Lei, et al.15.02
- Sequence-to-sequence Acoustic Modeling For Voice Conversion (2018)Jing-Xuan Zhang, Zhen-Hua Ling, Li-Juan Liu, et al.14.97
- Low Bit-rate Speech Coding With VQ-VAE And A Wavenet Decoder (2019)Cristina Gârbacea, Aäron van Den Oord, Yazhe Li, et al.14.80
- Large-scale Visual Speech Recognition (2018)Brendan Shillingford, Yannis Assael, Matthew W. Hoffman, et al.14.43
- A Comparative Study Of Glottal Source Estimation Techniques (2019)Thomas Drugman, Baris Bozkurt, Thierry Dutoit14.35
- Semantic Communications For Speech Signals (2020)Zhenzi Weng, Zhijin Qin, Geoffrey Ye Li14.35
- Neural Source-filter-based Waveform Model For Statistical Parametric Speech Synthesis (2018)Xin Wang, Shinji Takaki, Junichi Yamagishi13.97
- Non-intrusive Speech Quality Assessment Using Neural Networks (2019)Anderson R. Avila, Hannes Gamper, Chandan Reddy, et al.13.74
- Audiomnist: Exploring Explainable Artificial Intelligence For Audio Analysis On A Simple Benchmark (2018)Sören Becker, Johanna Vielhaben, Marcel Ackermann, et al.13.50
- Vector-quantized Neural Networks For Acoustic Unit Discovery In The Zerospeech 2020 Challenge (2020)Benjamin van Niekerk, Leanne Nortje, Herman Kamper13.50
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)Yong Xu, Qiuqiang Kong, Qiang Huang, et al.13.23
- The Deterministic Plus Stochastic Model Of The Residual Signal And Its Applications (2019)Thomas Drugman, Thierry Dutoit13.17
- Fast, Compact, And High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers For Mobile Devices (2016)Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, et al.13.05
- Waveform Modeling And Generation Using Hierarchical Recurrent Neural Networks For Speech Bandwidth Extension (2018)Zhen-Hua Ling, Yang Ai, Yu Gu, et al.12.99
- Ldnet: Unified Listener Dependent Modeling In MOS Prediction For Synthetic Speech (2021)Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, et al.12.74
- Voice Conversion Challenge 2020: Intra-lingual Semi-parallel And Cross-lingual Voice Conversion (2020)Yi Zhao, Wen-Chin Huang, Xiaohai Tian, et al.12.74
- A Real-time Wideband Neural Vocoder At 1.6 Kb/s Using Lpcnet (2019)Jean-Marc Valin, Jan Skoglund12.61
- Causal-anticausal Decomposition Of Speech Using Complex Cepstrum For Glottal Source Estimation (2019)Thomas Drugman, Baris Bozkurt, Thierry Dutoit12.40
- Face Landmark-based Speaker-independent Audio-visual Speech Enhancement In Multi-talker Environments (2018)Giovanni Morrone, Luca Pasa, Vadim Tikhanoff, et al.12.40
- Pseudo-convolutional Policy Gradient For Sequence-to-sequence Lip-reading (2020)Mingshuang Luo, Shuang Yang, Shiguang Shan, et al.12.17
- Latent-domain Predictive Neural Speech Coding (2022)Xue Jiang, Xiulian Peng, Huaying Xue, et al.12.15
- How To Design A Three-stage Architecture For Audio-visual Active Speaker Detection In The Wild (2021)Okan Köpüklü, Maja Taseska, Gerhard Rigoll12.10
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, et al.12.10
- Semantic Communications For Speech Recognition (2021)Zhenzi Weng, Zhijin Qin, Geoffrey Ye Li11.93
- Investigating Multi-feature Selection And Ensembling For Audio Classification (2022)Muhammad Turab, Teerath Kumar, Malika Bendechache, et al.11.85
- Real-time Target Sound Extraction (2022)Bandhav Veluri, Justin Chan, Malek Itani, et al.11.76
- High-quality Speech Coding With Samplernn (2018)Janusz Klejsa, Per Hedelin, Cong Zhou, et al.11.67
- Apcodec: A Neural Audio Codec With Parallel Amplitude And Phase Spectrum Encoding And Decoding (2024)Yang Ai, Xiao-Hang Jiang, Ye-Xin Lu, et al.11.58
- Audio Mamba: Bidirectional State Space Model For Audio Representation Learning (2024)Mehmet Hamza Erol, Arda Senocak, Jiu Feng, et al.11.58