Interface | Description |
---|---|
SemanticProcessor.Transformer<T> | |
SemanticProcessor.Visitor | |
VocabularyClassifier.Transformer<T> | |
VocabularyClassifier.Visitor |
Class | Description |
---|---|
AcronymDetector |
Detect acronyms like 'o.n.u' and extract 'onu'.
|
Anchorer |
Add an annotation on the first and last tokens of either a processed sequence (first/last) or range defined by an annotation a (first_a/last_a)
|
AnnotationManager |
Implements basic operations on semantic annotations.
|
AnnotationManagerOperation |
Element of an AnnotationManager configuration
|
AnnotationProcessed |
Alternative way to specify the list of annotations to be processed by the operation KeepLongestLeftMost
|
BasisTechTokenizationCompatibility |
No documentation for this element.
|
BasisTechTokenizer |
No documentation for this element.
|
BlackList |
No documentation for this element.
|
Categorizer |
A Categorizer classifies a whole document per the existing annotations on
selected Document Chunks. |
ChineseTokenizer |
When set in the configuration, tokenizes Chinese documents.
|
ChineseWordFinder |
This class performs words detection for Chinese.
Use with a Standard tokenizer. |
Chunker |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
CompoundWordSplitter |
Processor that can split CamelCase words (words with inner capital letters) and words with inner underscores.
|
Copy |
Copies a source annotation along with its display form and display kind to a target annotation
|
CustomSemanticProcessor |
A custom semantic processor that allows to plug custom code into the semantic pipeline.
|
CustomTokenizer |
No documentation for this element.
|
DutchDisagglutiner |
This class performs disagglutinations for Dutch.
Use with a Standard tokenizer. |
Entry |
No documentation for this element.
|
FarTextAnnotator |
Annotate alphanumeric tokens starting from 'startOffset'
|
FastRulesMatcher |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
FeaturesExtractor |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
Form |
No documentation for this element.
|
FormIndexingConfig |
No documentation for this element.
|
FrequencyAnalyzer |
Frequency-related toolbox
|
GermanDisagglutiner |
This class performs disagglutinations for German.
Use with a Standard tokenizer. |
GetLinguisticConfig |
No documentation for this element.
|
GetLinguisticConfigByVersion |
Interface for every action specific to a given applied configuration version
|
HierarchicalVocabularyClassifier |
VocabularyClassifier classified a whole document per the existing annotations
on selected Document chunks, it uses a Bayesian classifier. |
IdentityMatcher |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
JapaneseTokenizer |
When set in the configuration, tokenizes Japanese documents.
|
JapaneseWordFinder |
Performs words detection for Japanese.
Use with a Standard tokenizer. |
KeepFirst |
Selects the first N occurrences or values of an annotation and remove all others
|
KeepLeftMostLongest |
When several annotations overlap, keeps the leftmost (removes all others); if there are several leftmost annotations, then keep the longest ones.
For example, for 5 tokens "tow truck driver license requirements" and 3 annotations on "tow truck driver", "truck driver license requirements" and "license requirements" with the same tag, keeps the annotations on "tow truck driver" and "license requirements". |
KeepLongestLeftMost |
When several annotations overlap, keeps the longest (removes all others); if there are several longest annotations, then keep the leftmost ones.
For example, for 5 tokens "tow truck driver license requirements" and 3 annotations on "tow truck driver", "truck driver license requirements" and "license requirements" with the same tag, keeps the annotation on "truck driver license requirements" and removes the other two. |
LangDetectMapping |
Maps unicode range to a default language for its automatic atribution
|
LanguageDetector |
Detect languages of document; this processor can detect the language of small
sentences like "this is a small test" and handle multi-language documents. |
Lemmatizer |
Produces lemmas for each noun and adjective in the document contexts.
|
LinguisticConfig |
No documentation for this element.
|
MLNamedEntities |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
Modifiable |
No documentation for this element.
|
MOTConfig |
No documentation for this element.
|
NamedEntitiesEntry |
No documentation for this element.
|
NamedEntitiesMatcher |
A NamedEntitesMatcher detects 'named entities' (People, Organizations, Places)
in the textual content of the DocumentChunks. |
NamedEntitiesWhiteList |
No documentation for this element.
|
Negation |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
NGram |
The NGram processor annotates each sequence of minLength-maxLength words with a 'ngram' annotation.
|
Normalizer |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
NormalizerCharOverride |
Introduces custom normalization rule
|
NormalizerConfig |
No documentation for this element.
|
NormalizerIndexLower |
The list of words to index in lowercase instead of normalized form.
|
NorwegianDisagglutiner |
This class performs disagglutinations for Norwegian.
Use with a Standard tokenizer. |
ObjectFactory | |
OntologyMatcher |
An OntologyMatcher detects concepts defined in an ontology
in the textual content of the Document Chunks. Typically, an ontology contains a list of business terms to be detected. |
PartOfSpeechTagger |
A PartOfSpeechTagger detects part of speech for each word
in the text of Document Chunks. |
Phonetizer |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
PrettyPrinter |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
Proximity |
Annotates pieces of text where a number of annotations appear close to each other
|
ProximityElement |
Proximity processor configuration element
|
RegexpMatcher |
Matches a set of Perl-5 regular expressions, possibly capturing subparts of matches.
Use it for information extraction based on the text form like identifiers, emails or dates. |
RelatedTerms |
Extracts all possible related terms.
Only one instance of this processor may exist per input context. |
RelatedTermsEntry |
No documentation for this element.
|
RelatedTermsWhiteList |
No documentation for this element.
|
Remove |
Removes the specified annotations, possibly when some condition is met
|
RulesMatcher |
A RuleMatcher detects applies a rule engine
on the textual content of the DocumentChunks. The rules are defined in a separate XML 'resourceFile', and are a combination of regular expressions, word matching and Boolean operators over content. Please refer to the CloudView Configuration Reference for details. |
SelectByContexts |
Selects annotations appearing in the first context of a list sorted by decreasing priority.
For example, selecting an annotation from (title, text) will lookup title context and then, if the annotation is not found, text context. |
SelectMostFrequentAnnotation |
Selects the most frequent annotation and annotates the document with it
|
SelectMostFrequentValue |
Selects the N most frequent values of a given annotation and annotates the document with them
|
SemanticExtractor |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SemanticProcessor |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SentenceFinder |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SentimentAnalyzer |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SetLinguisticConfig |
No documentation for this element.
|
SimHash |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SnowballStemmer |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SpellCheckBlackList |
No documentation for this element.
|
SpellCheckEntry |
No documentation for this element.
|
SpellChecker |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SpellCheckParameters |
No documentation for this element.
|
SpellCheckWhiteList |
No documentation for this element.
|
SQI |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
StandardTokenizer |
Sets a specific configuration for the standard tokenizer.
|
StandardTokenizer.CharOverrides | |
StandardTokenizer.PatternOverrides | |
StandardTokenizerOverride |
No documentation for this element.
|
StandardTokenizerPatternOverride |
No documentation for this element.
|
SuggestBuildConfig |
Suggest build options
|
TokenizationConfig |
How to tokenize documents, ie split the input strings in tokens.
|
Tokenizer |
No documentation for this element.
|
TokenizerPlugin |
No documentation for this element.
|
URLRemover |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
VocabularyClassifier |
VocabularyClassifier classified a whole document per the existing annotations
on selected Document chunks, it uses a Bayesian classifier. |
WhiteList |
No documentation for this element.
|
WordDictionary |
A SemanticProcessor applies semantic processing on the textual content
of the DocumentChunks. A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
Copyright © 2013 Dassault Systèmes, All Rights Reserved.