Configuration : Appendix - Configure Semantic Processors : Ontology Matcher (Resource-Based)
 
Ontology Matcher (Resource-Based)
 
Dependencies
Rules for Ontology Matching
Sample Ontology Matcher XML File
Ontology Rules Syntax
Multilevel Ontology Example
Create the Ontology Matcher Resource File
Map an Annotation to a Category Facet
The Ontology Matcher is a semantic processor used for detecting expressions, or text forms in documents.
Detected text forms are then tagged with an annotation name and a corresponding display form. If you map this annotation to a category field, you can see its display forms as a new category in your list of facets. It gathers all the documents containing the text forms you linked to the display form.
Dependencies
If the matching rules for this processor depend on phonetic, stem, or lemma matching, you must add the corresponding processor above this one in the pipeline.
For example, if your rules require phonetic forms, place the Phonetizer processor above this processor in the analysis pipeline.
Rules for Ontology Matching
The Ontology Matcher detects expressions. Each expression belongs to an annotation package, which can be seen as a namespace.
Results are tagged by annotations spanning the range of tokens that has been matched.
Each annotation package creates an annotation tag.
Each tagged expression can have several text forms.
The rules for the Ontology Matcher are defined in an XML file saved in a resource directory. They are specified during the configuration of the Ontology Matcher semantic processor in the analysis configuration.
Sample Ontology Matcher XML File
In this example, we use the Ontology Matcher to create an annotation, with value my.annotation for a document when there is a reference to the brand Coca-Cola.
<Ontology xmlns="exa:com.exalead.mot.components.ontology">
<Pkg path="my.annotation">
<Entry display="Coca-Cola">
<Form level="exact" />
</Entry>
<Pkg path="subannotation">
<Entry display="Albert Einstein" lang="en">
<Form value="Albert E." level="normalized" />
</Entry>
</Pkg>
</Pkg>
<Pkg path="my.second.annotation">
<Entry display="Recherche et Développement" lang="fr">
<Form level="exact" distance="0" />
<Form level="lowercase" distance="1" />
<Form level="normalized" distance="2" />
<Form value="R&amp;D" level="normalized" distance="3" />
<Form value="R &amp; D" level="exact" distance="4" />
</Entry>
</Pkg>
</Ontology>
These Ontology rules would create the following annotations:
Input: "Always Coca-Cola..." Result: The annotation created is displayForm="Coca-Cola" tag="my.annotation" level="exact" distance= "0"
Input: "always coca-cola..." Result: No annotation is created since the match level is set to exact and "Coca-Cola" != "coca-cola"
Input: "the famous albert e." (lang=en) Result: The annotation created is displayForm="Albert Einstein" tag="my.annotation.subannotation" level= "normalized" distance= "0"
Input: "le célèbre albert e." (lang=fr) Result: No annotation is created since the token is tagged as French.
Input: "recherche et développement" Result: The annotation created is displayForm= "Recherche et Développement" tag= "my.second.annotation" level="lowercase" distance= "1"
Input: "R&amp;D" Result: The annotation created is displayForm= "Recherche et Développement" tag= "my.second.annotation" level="normalized" distance= "3"
Ontology Rules Syntax
An annotation package is characterized by a path and can contain:
Subannotations whose path are concatenated using the "." separator.
A set of expressions to detect.
For example:
<Pkg path="subannotation">
<Entry display="Albert Einstein" lang="en">
<Form value="Albert E." level="normalized" />
</Entry>
</Pkg>
Display Forms
An expression, or display form, is characterized by a value and an optional language. When a language is specified, only tokens of this language are used for detection. For example,
<Entry display="Coca-Cola"> <Form level="exact" /></Entry>
A display form can have one or more matches with text forms. A text form contains a value, a normalization level, and an optional distance. For example,
<Form value="Albert E." level="normalized" />
The distance attribute can be used for scoring the annotation depending on which alternative text form has matched. When the text form’s value is not specified, the display form for the associated expression is used instead.
Available Matching Normalization Levels
The level attribute specifies which form must be matched. Certain levels rely on other semantic processors, which you must place above the Ontology Matcher in the analysis pipeline.
Ontology Matcher Level Attribute and Possible Values
Level
Description
exact
Matches using exact form (the token).
lowercase
Matches using the lowercase form.
Requires a Normalizer.
normalized
Matches using the normalized form.
Requires a Normalizer.
lemmaSingularMasculine
Matches using the singular/masculine/normalized form.
Requires a Lemmatizer.
stem
Matches using the stemmed form.
Requires a Snowball Stemmer
phonetic
Matches using the phonetic form.
Requires a Phonetizer.
Multilevel Ontology Example
<Ontology xmlns="exa:com.exalead.mot.components.ontology">
<Pkg path="organization">
<Entry display="company/computer/maker/Lenovo Group">
<Form value="Lenovo Group" level="exact" />
<Form value="Lenovo" level="exact" />
</Entry>
<Entry display="company/computer/maker/Dell Inc">
<Form value="Dell Inc" level="exact" />
<Form value="Dell" level="exact" />
</Entry>
<Entry display="company/computer/maker/Hewlett Packard Co">
<Form value="Hewlett Packard Co" level="exact" />
<Form value="Hewlett Packard" level="exact" />
<Form value="HP" level="exact" />
</Entry>
</Pkg>
<Pkg path="IT">
<Entry display="company/computer/IT/IBM Corp">
<Form value="IBM" level="exact" />
<Form value="International Business Machines" level="lowercase" />
<Form value="IBM Corp" level="exact" />
<Form value="International Business Machines Corporation" level="lowercase" />
</Entry>
</Pkg>
</Ontology>
Create the Ontology Matcher Resource File
Create a Resource File from the Administration Console
The most convenient method consists in creating an empty resources file in the Administration Console and defining its content with the Business Console. See Create a Resource File from the Administration Console .
To Compile a Resource File from the Command Line
This procedure describes how to manually create and compile your resources from the command line.
1. Create a rule XML file and save it in a temporary directory. For an example, see Sample Ontology Matcher XML File.
2. Compile the ontology rules XML file:
a. Go to <DATADIR>/bin/
b. Open the cvadmin command tool and start the following command.
cvconsole cvadmin> linguistic compile-ontology input=”<PATH TO ONTOLOGY XML FILE>”
output=”<PATH TO OUTPUT DIR>”
3. In the Administration Console, select Index > Data processing > Pipeline name > Semantic Processors.
4. Drag the Ontology Matcher to the required position in the Current Processors list, expand it and:
a. For Resource directory, enter the path to the compiled ontology file.
b. Select the parameters Restrict language and Keep longest match
For more information about available parameters, see in the Exalead CloudView XML Configuration Reference Guide.
Map an Annotation to a Category Facet
Once your Ontology Matcher resource file is defined, you can map an annotation to a category field. You are then able to see its display forms as a new category in your list of facets. It gathers all the documents containing the text forms you linked to the display form.
1. In the Administration Console, select Index > Data processing > Pipeline name > Semantic Processors.
2. On the Mappings tab, click Add mapping source.
a. Name: Enter the annotation name that you created in the rules file, for example, my.annotation for the sample file above.
b. Type: select Annotation.
3. (Optional) In Input from field of the mapping, restrict the mapping so it only applies to a subset of comma-separated metas (also known as contexts) associated with this annotation.
4. Click Add mapping target and add a category target.
5. Modify the category-mapping properties.
For example, the Create categories under this root property could be modified to Top/Product to contain a Coca-Cola category (corresponding to the Coca-Cola display form).
6. Go to Search > Search Logics >Your_Search_Logic > Facets and add a category group.
a. Click Add facet and enter the name to display in the Mashup UI Refinements panel.
b. For Root, enter the value you have entered for Create categories under this root in step 5, for example, Top/Product.
7. Click Apply.