Configuration : Appendix - Configure Semantic Processors : Proximity
 
Proximity
 
How Is the Best Match Selected?
Configure the Proximity Processor
The Proximity processor is a MOT processor that spots pieces of text where a number of annotations appear close to each other.
The behavior is similar to that of a NEAR operator in query language, with more options to express distance constraints.
How Is the Best Match Selected?
When faced with the choice of the best match, that is to say, when several candidate matches overlap, the following criteria are used:
The shortest match in terms of token is preferred. For example, searching for A NEAR B in the text A A B B selects A [A B] B.
The greater the sum of the lengths of element annotations the better. For example, if more than one annotation A appear on a token, the longest is chosen.
Configure the Proximity Processor
The configuration includes:
A list of <ProximityElement> defining the set of annotations to search for. These elements have the following attributes:
annotation: the annotation tag
value: the annotation display form to match. Regular expressions can be used by enclosing the string in slashes.
mandatory (default true): when set to false, the processor can report matches where this annotation is missing.
name: a name that can be referenced in the output annotation format using $name. Names must be made of characters in [0-9a-zA-Z].
An output annotation annotation tag and an optional displayForm string defining a format where named elements can be referenced to build the output display form. If displayForm is undefined, the output annotation does not have any value.
An ordered Boolean attribute forcing the annotations to be matched in their definition order when set to true. If false, any permutation of the annotations list is allowed to match.
An allowElementOverlap Boolean attribute (default= false) allowing the annotations of elements to overlap when set to true.
A number of nonmutually exclusive distance constraints:
sentenceScope (default false) – if true, each match is to be contained in a sentence, no match spans several sentences.
paragraphScope (default false) – if true, each match is to be contained in a paragraph, no match spans several paragraphs.
windowSize (default 2048, max 8192) – maximum size of a match in terms of token.
minDistance (default none, max 4096) – the minimum distance between two elements in terms of token.
maxDistance (default none, max 4096) – the maximum distance between two elements in terms of token.
Proximity processor XML configuration file sample
<Proximity xmlns="exa:com.exalead.linguistic.v10" annotation="output" displayForm="when:$date who:$people."
ordered="false"
windowSize="100" sentenceScope="true" >
<ProximityElement annotation="NE.date" value="/[1-5] 201./" mandatory="true" name="date" />
<ProximityElement annotation="NE.famouspeople" value="Olivier Panis" name="people" />
</Proximity>