Abstract
arXiv:2601.10085v2 Announce Type: replace Abstract: Therapeutic dialogue is not a sequence of isolated responses: client goals, motivation, resistance, and therapeutic alliance evolve over time. Yet current LLM-based mental health dialogue systems often lack explicit mechanisms for tracking these dynamics across extended interactions, which can lead to poorly timed interventions or premature goal resolution. We introduce CALM-IT, a framework for generating and evaluating long-form Motivational Interviewing dialogues through explicit modeling of evolving client and counselor states, guiding both counseling strategy selection and utterance generation. We evaluate CALM-IT on a large-scale corpus of 8,232 synthetic dialogues spanning multiple dialogue lengths and frameworks. Compared with all baselines, CALM-IT achieves the best performance on most MITI 4.2 global ratings, including Empathy, Partnership, and Softening Sustain Talk, as well as on other key performance metrics while exhibiting minimal performance degradation as dialogue length increases. Notably, although CALM-IT initiates fewer change-directed prompts, it produces the highest client acceptance rate (64.3%) on average across different length conditions. We release a reproducible generation framework, a MITI-grounded process-level evaluation protocol, and a large-scale synthetic corpus for studying therapeutic LLMs under realistic long-form interaction conditions.