Abstract
This paper studies the crucial impact of initialization in Low-Rank Adaptation (LoRA). Through theoretical analysis, we demonstrate that the fine-tuned result of LoRA is highly sensitive to initialization, which is likely to lead suboptimal low-rank results. While this issue can be mitigated by adjusting the initial direction towards the main singular vectors of the target $\Delta W$, which is, however, typically unknown in real-world scenarios. To approximate this initial direction, we propose High-Rank Preheating (HRP), which first trains LoRA with a higher preheating rank for a few steps, then uses the main singular vectors of the derived $BA^\top$ as initialization for the main fine-tuning process. With only a modification in the initial direction, we prove that HRP makes LoRA achieve better fine-tuned results than random initialization in expectation, and the enhancement grows with the preheating rank. We validate our theoretical findings through extensive experiments in various models and tasks, where HRP significantly enhances LoRA's effectiveness and outperforms other initialization strategies and other LoRA variants.