public class SequenceOptimizationTransformer
extends NodeVisitor
Optimize tight sequence containing stopwords
The query "statue of liberty" is much more efficient when rewritten (statue BEFORE/2 liberty) AND "statue of liberty"
because the inverted list for the stopword "of" has to be scanned only for documents that match the BEFORE part.
Since stopwords inverted-lists are very long and retrieve almost all documents, speed gain is far from negligible on big corpora.
Stopwords at beginning or end of query can be totally ignored in BEFORE part:
"the statue of liberty" => (statue BEFORE/2 liberty) AND "the statue of liberty"
The BEFORE subtree is "neutralized" wrt to the scoring, the optimization has no influence on documents ranking