The learning objective plays a fundamental role to build a recommender
system. Most methods routinely adopt either pointwise or pairwise loss to train
the model parameters, while rarely pay attention to softmax loss due to its
computational complexity when scaling up to large datasets or intractability
for streaming data. The sampled softmax (SSM) loss emerges as an efficient
substitute for softmax loss. Its special case, InfoNCE loss, has been widely
used in self-supervised learning and exhibited remarkable performance for
contrastive learning. Nonetheless, limited recommendation work uses the SSM
loss as the learning objective. Worse still, none of them explores its
properties thoroughly and answers Does SSM loss suit for item
recommendation?'' andWhat are the conceptual advantages of SSM loss, as
compared with the prevalent losses?’’, to the best of our knowledge.
In this work, we aim to offer a better understanding of SSM for item
recommendation. Specifically, we first theoretically reveal three
model-agnostic advantages: (1) mitigating popularity bias; (2) mining hard
negative samples; and (3) maximizing the ranking metric. However, based on our
empirical studies, we recognize that the default choice of cosine similarity
function in SSM limits its ability in learning the magnitudes of representation
vectors. As such, the combinations of SSM with the models that also fall short
in adjusting magnitudes may result in poor representations. One step further,
we provide mathematical proof that message passing schemes in graph convolution
networks can adjust representation magnitude according to node degree, which
naturally compensates for the shortcoming of SSM. Extensive experiments on four
benchmark datasets justify our analyses, demonstrating the superiority of SSM
for item recommendation. Our implementations are available in both TensorFlow
and PyTorch.