Two-Sided Time-Independent Regret for Matching Markets with Limited Interviews

Abstract

arXiv:2602.12224v2 Announce Type: replace-cross Abstract: Two-sided matching platforms rely on preferences from both sides, yet participants can evaluate only a small fraction of potential partners. In practice, they use low-cost pre-match screening, e.g., interviews, profile views, or trial tasks, to form noisy impressions before committing to applications and offers. We study bandit learning in matching markets with interviews, modeling these interactions as queried \emph{hints}~\citep{DBLP:conf/innovations/BhaskaraGIKM23} that reveal partial preference information to both sides while constraining subsequent applications. Our framework also allows firm-side uncertainty: firms, like agents, learn their preferences and may make early hiring mistakes. To address this, we introduce strategic deferral, a firm-side action that permits temporary vacancy, corrects premature commitments, and enables decentralized learning under coarse anonymous feedback. We design algorithms for centralized and decentralized markets and show that a constant number of interviews per round suffices for horizon-independent regret, improving over the $O(\log T)$ guarantees known without interviews. Our bounds are near-optimal: the centralized guarantee is within a factor $m$ of an information-theoretic lower bound, while decentralized algorithms match it up to polynomial factors in structured markets and remain horizon-independent in general markets.

Abstract

Related papers