Abstract
Classic retrieval methods use simple bag-of-word representations for queries and documents. This representation fails to capture the full semantic richness of queries and documents. More recent retrieval models have tried to overcome this deficiency by using approaches such as incorporating dependencies between query terms, using bi-gram representations of documents, proximity heuristics, and passage retrieval. While some of these previous works have implicitly accounted for term order, to the best of our knowledge, term order has not been the primary focus of any research. In this paper, we focus solely on the effect of term order in information retrieval. We will show that documents that have two query terms in the same order as in the query have a higher probability of being relevant than documents that have two query terms in the reverse order. Using the axiomatic framework for information retrieval, we introduce a constraint that retrieval models must adhere to in order to effecti