XML Configuration Reference : Search : OntologyMatcher
 
OntologyMatcher
com.exalead.indexing.analysis.v10.OntologyMatcher
An OntologyMatcher detects concepts defined in an ontology in the textual content of the Document Chunks. Typically, an ontology contains a list of business terms to be detected. Resulting Annotations are mapped to enable navigation by business concepts. Annotations generated:
Depends on the resource (See Pkg).
Parent elements:
com.exalead.mercury.mami.search.v20.SemanticProcessorModule (as SemanticProcessorModule)
com.exalead.mercury.mami.search.v20.SemanticQueryAnalysisConfig (as SemanticQueryAnalysisConfig)
Attributes:
Name
Type
Default value
Description
name
string
Name of the Semantic Processor. This name is only used for tracing and debugging purposes.
contexts
string
Comma-separated list of the ContextNames of the Document Chunks on which this processor should be applied. If this list is empty, all DocumentChunks are processed.
dataModelState
string
Is this semantic processor managed by a data model? @enum{null,auto,customized, error}. If null, this semantic processor is not related to the data model. If "auto", this semantic processor is auto-generated by the data model.
dataModelClass
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor.
dataModelProperty
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor.
disabled
boolean
Disables the DocumentProcessor
enableApproxMatching
boolean
Enables approximative matching in ontology. Approximative matching uses the Damerau-Levenshtein edit distance.
minWordSizeForDist1
int
3
Minimum number of chars in token to enable the Damerau-Levenshtein distance of 1.
minWordSizeForDist2
int
8
Minimum number of chars in token to enable the Damerau-Levenshtein distance of 2.
resourceDir
string
URL for the directory containing the ontology (data://, file;// or resource://).
restrictLanguage
boolean
True
Keeps only the expression added with language == Language.XX or with the document language. For example, if the Ontology contains an expression added with language=En, it will be extracted only for an English document if restrictLanguage is set to true.
keepLongestMatch
boolean
True
Keeps only the longest match. For example, if you have 5 tokens ('a', 'b', 'c', 'd', 'e') and 4 annotations 'a', 'a-c', 'b-c-d' and 'd-e', this option will only keep 'b-c-d' and remove all other annotations.
keepLongestMatchInterTag
boolean
Keeps only the longest match (tag independant). For example, if you have 5 tokens ('a', 'b', 'c', 'd', 'e') and 4 annotations 'a', 'a-c', 'b-c-d' and 'd-e', this option will only keep 'b-c-d' and remove all other annotations.
tokenizeAnnotations
boolean
If you have some multi-tokens annotations (like "super market" annotation on token "supermarket", this option will automatically subtokenize "supermarket" in "super" "market" and keep original annotations. If you enable this option, keepLongestMatch and keepLongestMatcherInterTag will be set to true.
annotationsToIgnore
string
Sets the list of annotations to be ignored (comma-separated). This feature allows you to define a list of words/expressions to ignore in the recognition of this ontology. For example, if you add:
the expressions "of" and "the" with the tag "toIgnore" in ontology A,
and the expression "website embassy" in ontology B with tagsToIgnore="toIgnore",
... you will be able to match "website of the embassy", "website of embassy" and "website embassy".
ignoreSpaces
boolean
If your ontology was compiled with matchOnSeparators=false, this allows 'lemonde' to retrieve 'le monde' or 'le monde' to retrieve 'lemonde'. If your ontology was compiled with matchOnSeparators=true, this allows 'le monde' to retrieve 'le monde'.
annotationPrefix
string
A prefix to add to each annotation tag. For example, if the package of the entry matched in the ontology is "exalead.location.country" and the annotationPrefix is "myOntology_", an annotation will be added with the tag "myOntology_exalead.location.country".
trustLevelBasedDedup
boolean
Keeps only the annotation with the highest trust level when several entries from a package match the same text chunk.
Nested elements:
Name
Type
Description
fromDataModel
com.exalead.indexing.analysis.v10.SemanticProcessor
If dataModelState is "customized", you will find here the original semantic processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor