A Pilot Study Of Gslm-based Simulation Of Foreign Accentuation Only Using Native Speech Corpora
2024 Β· Kentaro Onda, Joonyong Park, Nobuaki Minematsu, et al.
Abstract
We propose a method of simulating the human process of foreign accentuation using Generative Spoken Language Model (GSLM) only with native speech corpora. When one listens to spoken words of a foreign language and repeats them, the repeated speech is often with the accent of that listener's L1. This is said to be because the spoken words are mentally represented as a sequence of phonological units of the L1, and those units are used for oral reproduction. We simulate this process by inputting speech of language A into GSLM of language B to add B's accent onto the input speech. The process of running ASR of the L1 for foreign input speech and giving the ASR result to TTS of the L1 can be viewed as a naive implementation of this approach. The results of our experiments show that the synthesized accent of the output speech is highly natural, compared to real samples of A generated by speakers whose L1 is B, and that the degree of accentuation is controllable.
Authors
(none)
Tags
Stats
Related papers
- Synthetic Cross-accent Data Augmentation For Automatic Speech Recognition (2023)0.00
- Macst: Multi-accent Speech Synthesis Via Text Transliteration For Accent Conversion (2024)5.24
- Exploring Retraining-free Speech Recognition For Intra-sentential Code-switching (2021)5.84
- Enhancing Code-switched Text-to-speech Synthesis Capability In Large Language Models With Only Monolingual Corpora (2024)0.00
- Leveraging Native Language Speech For Accent Identification Using Deep Siamese Networks (2017)7.50
- Simulating Native Speaker Shadowing For Nonnative Speech Assessment With Latent Speech Representations (2024)0.00
- Text-free Prosody-aware Generative Spoken Language Modeling (2021)20.95
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00