EELE: Exploring Efficient And Extensible Lora Integration In Emotional Text-to-speech
2024 Β· Xin Qi, Ruibo Fu, Zhengqi Wen, et al.
Abstract
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into the pre-introduced conditional parts of the speech models. This fixes the position of LoRA, limiting the flexibility and scalability of its application. Therefore, we propose the Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech (EELE) method. Starting from a general neutral speech model, we do not pre-introduce emotional information but instead use the LoRA plugin to design a flexible adaptive scheme that endows the model with emotional generation capabilities. Specifically, we initially train the model using only neutral speech data. After training is complete, w
Authors
(none)
Tags
Stats
Related papers
- Exploring Transfer Learning For Low Resource Emotional TTS (2019)0.00
- EMORL-TTS: Reinforcement Learning For Fine-grained Emotion Control In Llm-based TTS (2025)0.00
- A Methodology For Controlling The Emotional Expressiveness In Synthetic Speech -- A Deep Learning Approach (2019)5.84
- Reinforcement Learning For Emotional Text-to-speech Synthesis With Improved Emotion Discriminability (2021)0.00
- Limited Data Emotional Voice Conversion Leveraging Text-to-speech: Two-stage Sequence-to-sequence Training (2021)10.35
- RLAIF-SPA: Structured AI Feedback For Semantic-prosodic Alignment In Speech Synthesis (2025)0.00
- RALL-E: Robust Codec Language Modeling With Chain-of-thought Prompting For Text-to-speech Synthesis (2024)0.00
- Behind The Scenes: Mechanistic Interpretability Of Lora-adapted Whisper For Speech Emotion Recognition (2025)1.81