Configuration : Configuring Data Processing : Managing Semantic Annotations
 
Managing Semantic Annotations
 
Manage Annotations with the Annotation Manager
Manage Annotations with Custom Code
This section explains how to manage semantic annotation with the Annotation Manager or with custom code.
Manage Annotations with the Annotation Manager
The Annotation Manager allows you to perform several operations on annotations under the right conditions. You can use it to copy, select, and remove annotations.
The Annotation Manager configuration consists of a list of operations.
Important: There is no define order of the execution of operations. If you really care about operation ordering, you must add several annotation managers to the semantic pipe.
Copy Annotation
You can copy a source annotation along with its display form, display kind, and trust level to a target annotation.
Option
Example
Copy without condition
For example, to ignore the distinction between famous and nonfamous people.
<Copy annotation="NE.famousperson" target="NE.person"/>
Copy with a condition
For example, to copy famous people to the nonfamous people annotation, unless they are block listed.
<Copy annotation="NE.famouspeople" target="NE.people" unless="blocklisted"/>
Remove Annotation
You can remove the occurrences of an annotation under the right conditions.
Option
Example
Remove without condition
For example, to remove all end-of-sentences:
<Remove annotation="sbreak" />
Remove an annotation if it overlaps with another one
For example, to remove a person's name annotation when it spreads over two sentences:
<Remove annotation="NE.person" ifOverlapWith="sbreak" />
Remove an annotation if the annotated text span matches that of another one
For example, to remove a person's name annotation when the text is block listed:
<Remove annotation="NE.person" ifMatchWith="blocklist.person" />
An ontology matcher upstream or any other semantic processor can set the annotation blocklist.person. Both annotations must start and end exactly on the same tokens.
Remove an annotation if the annotated text span and display form match those of another one
For example, we want to implement a block list with a fine granularity:
<Remove annotation="title.approx" ifMatchWith="blocklist.title" displayFormsMustMatch="true"/>
If an ontology containing a title package matches professor on the text processor using approximation
<pkg path="title"> <Entry> <Form value="professor" /> </Entry> </pkg>
... the annotation is removed if the annotation (blocklist.title, "professor") occurs at the very same place, thus block listing the specific approximation.
Keep the first occurrence of an annotation and remove all others
For example, to keep only the first organization occurrence in title and text:
<KeepFirst annotation="NE.organization" contexts="title,text"/>
Keep the longest leftmost of a set of overlapping annotations and remove all others
<KeepLongestLeftMost annotations="NE.person,NE.place,NE.organization" interTags="false"/>
With interTags set to false, one annotation per tag is kept.
Select Annotation
You can select the most frequent annotations and store the results as document annotations.
Option
Example
Select the most frequent values in a document for a given annotation
For example, we want to select the 5 places that occur the most in a document and store them in selectedPlaces document annotations.
<SelectMostFrequentValue annotation="NE.place" documentAnnotation="selectedPlace" howMany="5" truncate="true"/>
If there are more than 5 most frequent places, the resulting list is arbitrarily truncated since truncate="true" guarantees that no more than 5 annotations are ever reported.
Select the most frequent annotation in a document among a list
<SelectMostFrequentAnnotation annotations="NE.organization,NE.place,NE.person" documentAnnotation="selectedAnnotation"/>
The most frequent annotation is used to output a selectedAnnotation document annotation whose value is one of the annotations from the list.
Select annotations depending on an index field (context) priority
For example, we want to select an annotation from the "title, text" contexts, by first looking within the title context and then, if the annotation is not found, looking within the text context:
<SelectByContexts annotation="NE.person" contexts="title,text" documentAnnotation="selectedAnnotation" firstOnly="false"/>
With firstOnly set to false, all occurrences of NE.person annotations in the specified contexts are reported.
Use Regular Expressions
You can use regular expressions for all annotation parameters.
Set enableRegexp to true in the <AnnotationManager> object (default is false).
Example of Annotation Manager XML Configuration File
<AnnotationManager xmlns="exa:com.exalead.linguistic.v10">
<Copy annotation="NE.famousperson" target="NE.person"/>
<Copy annotation="NE.famousperson" target="NE.person" unless="blocklist"/>
<Remove annotation="NE.famousperson" ifOverlapWith="sbreak" />
<Remove annotation="NE.person" ifOverlapWith="sbreak" />
<Remove annotation="NE.person" ifMatchWith="blocklist.person" />
<Remove annotation="title.approx" ifMatchWith="blocklist.title" displayFormsMustMatch="true"/>
</AnnotationManager>
Manage Annotations with Custom Code
In most semantic-oriented projects, you need to manipulate (filter, combine, replace, etc.) the semantic annotations set by your semantic processors before sending them to the index.
The easiest way to do this is to add semantic processors into the Document Processor pipeline, transforming the annotations into metas (also known as chunks). Then you can manipulate them using either:
custom java code via the JavaDocumentProcessor
standard document processors such ReplaceValues or ConcatenateValues.
Example: Index Term Occurrences in a Document
Let us say you want to send to the index the number of times a term is matched in the document from an existing list of terms.
Recommendation: Use the OntologyMatcher to detect all terms. Go through it using a SemanticPipeDocumentProcessor. Convert the semantic annotations into metas (or chunks) and use custom java code to count them.
Create a List of Terms
This ontology annotates each term of the list with the "myterms" annotation.
<Ontology xmlns="exa:com.exalead.mot.components.ontology">
<Pkg path="myterms">
<Entry display="Term 1">
<Form level="normalized" />
</Entry>
<Entry display="Term 2">
<Form level="normalized" />
</Entry>
<Entry display="Term 3">
<Form level="normalized" />
</Entry>
<!-- [...] -->
</Pkg>
</Ontology>
Modify the Analysis Pipeline
Add the following configuration to the end of your document processor Analysis Pipeline.
Each "myterms" semantic annotation is converted into a meta (or chunk), that the Document Processors can manipulate.
The XML representation of the SemanticPipeDocumentProcessor configuration looks like:
<SemanticPipeDocumentProcessor
annotations="myterms" // Comma-separated semantic annotations that will be converted into metas
topLevelAnnotationsOnly="false" // only convert document annotations?
disabled="false" name="SemanticPipeDocumentProcessor.0">
<OntologyMatcher resourceDir="/path/to/myterms.bin" disabled="false" name="OntologyMatcher.0"/>
</SemanticPipeDocumentProcessor>
In the JavaDocumentProcessor, count the number of metas (or chunks) named "myterms". Add the count to a new "nbTerms" meta (the mapping of this "nbTerms" meta to the index is not detailed here).
import com.exalead.pdoc.ProcessableDocument;
import com.exalead.pdoc.analysis.DocumentProcessingContext;
import com.exalead.pdoc.analysis.StandardDocumentProcessor;
import com.exalead.pdoc.Meta;

public class JavaDocumentProcessorTemplate extends StandardDocumentProcessor {
@Override
public void process(DocumentProcessingContext context, ProcessableDocument document)
throws Exception {
int count = 0;
for (Meta m : document.getMetas("myterms")) {
++count;
}
document.addMeta("nbTerms", String.valueOf(count));
}
}