Form indexing configuration defines pairs of semantic annotations and matching modes (or index levels). Here, semantic annotation values are indexed at the defined matching mode. The matching mode is an arbitrary integer required to access inverted lists (word, level), which gives access to the word positions in all the documents.
Three matching modes have a predefined meaning: 0 is exact, 1 is lowercase, 2 is normalized. The rest is up to the user.
For example, there is a hidden form indexing configuration (NORMALIZE, 2) that defines that the normalizer's NORMALIZE annotations must be indexed at level 2. Then at query time, if normalized is the prefix handler matching mode, these annotations permit access to the index and to look for the requested words.
Use Form indexing for Over-Indexing Acronyms
The form indexing customization helps over-indexing. For example, we want that the query NASA matches occurrences of NASA and N.A.S.A.. That is to say, each time N.A.S.A. appears in a document, we want to over-index it with NASA.
1. Add an acronym detector in the analysis pipe.
a. Go to Data Processing > Semantic Processors.
b. Drag the Acronym Detector in the analysis pipeline.
2. Add a form indexing (acronym, 2) so that the acronym detector's annotations are indexed at level 2.
a. Go to Index > Linguistics > Tokenizations > Advanced.
b. Click Add form.
c. For Tag, enter acronym and for Matching Mode, enter 2 (normalized).
Since our prefix handler targets matching mode 2, any query word can match any over-indexed value coming from the acronym detector.
Set Weight
To set a distance (or weight) in Form indexing configuration, you may specify an additional Trust level in Index > Linguistics > Tokenizations > Advanced. This attribute ranges from 1 to 100, 100 being the highest and default weight. The query expander uses it to compute a weight for expansion.
1. Let us say that in the Linguistic.xml file, the trustLevel parameter corresponds to a weight of 50.