Managing Semantic Annotations

Option	Example
Copy without condition	For example, to ignore the distinction between famous and nonfamous people. <Copy annotation="NE.famousperson" target="NE.person"/>
Copy with a condition	For example, to copy famous people to the nonfamous people annotation, unless they are block listed. <Copy annotation="NE.famouspeople" target="NE.people" unless="blocklisted"/>

Option	Example
Remove without condition	For example, to remove all end-of-sentences: <Remove annotation="sbreak" />
Remove an annotation if it overlaps with another one	For example, to remove a person's name annotation when it spreads over two sentences: <Remove annotation="NE.person" ifOverlapWith="sbreak" />
Remove an annotation if the annotated text span matches that of another one	For example, to remove a person's name annotation when the text is block listed: <Remove annotation="NE.person" ifMatchWith="blocklist.person" /> An ontology matcher upstream or any other semantic processor can set the annotation blocklist.person. Both annotations must start and end exactly on the same tokens.
Remove an annotation if the annotated text span and display form match those of another one	For example, we want to implement a block list with a fine granularity: <Remove annotation="title.approx" ifMatchWith="blocklist.title" displayFormsMustMatch="true"/> If an ontology containing a title package matches professor on the text processor using approximation <pkg path="title"> <Entry> <Form value="professor" /> </Entry> </pkg> ... the annotation is removed if the annotation (blocklist.title, "professor") occurs at the very same place, thus block listing the specific approximation.
Keep the first occurrence of an annotation and remove all others	For example, to keep only the first organization occurrence in title and text: <KeepFirst annotation="NE.organization" contexts="title,text"/>
Keep the longest leftmost of a set of overlapping annotations and remove all others	<KeepLongestLeftMost annotations="NE.person,NE.place,NE.organization" interTags="false"/> With interTags set to false, one annotation per tag is kept.

Option	Example
Select the most frequent values in a document for a given annotation	For example, we want to select the 5 places that occur the most in a document and store them in selectedPlaces document annotations. <SelectMostFrequentValue annotation="NE.place" documentAnnotation="selectedPlace" howMany="5" truncate="true"/> If there are more than 5 most frequent places, the resulting list is arbitrarily truncated since truncate="true" guarantees that no more than 5 annotations are ever reported.
Select the most frequent annotation in a document among a list	<SelectMostFrequentAnnotation annotations="NE.organization,NE.place,NE.person" documentAnnotation="selectedAnnotation"/> The most frequent annotation is used to output a selectedAnnotation document annotation whose value is one of the annotations from the list.
Select annotations depending on an index field (context) priority	For example, we want to select an annotation from the "title, text" contexts, by first looking within the title context and then, if the annotation is not found, looking within the text context: <SelectByContexts annotation="NE.person" contexts="title,text" documentAnnotation="selectedAnnotation" firstOnly="false"/> With firstOnly set to false, all occurrences of NE.person annotations in the specified contexts are reported.

import com.exalead.pdoc.ProcessableDocument;
import com.exalead.pdoc.analysis.DocumentProcessingContext;
import com.exalead.pdoc.analysis.StandardDocumentProcessor;
import com.exalead.pdoc.Meta;

public class JavaDocumentProcessorTemplate extends StandardDocumentProcessor {
@Override
public void process(DocumentProcessingContext context, ProcessableDocument document)
throws Exception {
int count = 0;
for (Meta m : document.getMetas("myterms")) {
++count;
}
document.addMeta("nbTerms", String.valueOf(count));
}
}