← all papers Β· overview

Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

Β·2026

Abstract

arXiv:2604.18239v2 Announce Type: replace-cross Abstract: Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based objectives suppress the chosen response along with the rejected one, a phenomenon known as likelihood displacement, and no general mechanism currently prevents this across objectives. We bridge this gap by presenting a unified *incentive-score decomposition* of preference optimization, revealing that diverse objectives share identical local update directions and differ only in their scalar weighting coefficients. Building on this decomposition, by analyzing the dynamics of the chosen/rejected likelihoods, we identify the *disentanglement band* (DB), a simple, testable condition that characterizes when training can avoid likelihood displacement by realizing the preferred pathway: suppressing the loser while maintaining the winner, possibly after an initial transient. Leveraging the DB, we propose

Related papers