Programmer : CloudView Programmer : The Semantic Factory SDK : Available Processors
 
Available Processors
The following list describes the main processors that are available in the Semantic Factory, with their dependencies and supported languages.
Name
Dependencies
Description
Supported Languages
AcronymDetector
Detects acronyms.
AnnotationManager
Provides basic operations on annotations (copy, removal, and so on).
Categorizer
Machine learning classifier, categorizes the whole document according to a learning resource.
Chunker
PartOfSpeechTagger
Detects subject/verb in a sentence.
en, fr, it
FastRulesMatcher
Matches documents against rules.
LanguageDetector
Detects language of tokens. This processor can detect language of small sentence and handle multi-languages documents.
over 100 languages
Lemmatizer
Identifies the lemma of each word using a language dictionary (no disambiguation).
de, en, es, fr, it, pt
NamedEntitiesMatcher
RelatedTerms
Detects named entities (people, organizations, places, events, emails, dates, currency, French addresses, urls, French phone numbers, French/English opening hours)
NGram
NGram extractor.
OntologyMatcher
Extracts words/expressions defined in an ontology.
Depends on the ontology content.
PartOfSpeechTagger
Detects part of speech (noun/verb/adjective/...) for each token with disambiguation.
fr, it, en
Phonetizer
Phonetizes tokens.
ca, cs, da, de, en, es, et, fa, fi, fr, it, nl, no, pl, pt, ro, ru, sk, sl, sv
PrettyPrinter
Prints pretty tokens.
Proximity
Annotates pieces of text where a number of annotations appear close to each other.
RelatedTerms
PartOfSpeechTagger only if withPartOfSpeech=true (default value)
Extracts noun phrases from the tokens' stream.
ar, ca, cs, da, de, en, es, et, fa, fi, fr, he, it, ja, nl, no, pl, pt, ro, ru, sk, sl, sv, zh
RulesMatcher
Extracts 'patterns' from the tokens' stream.
SemanticExtractor
Extraction of semantic features (numbers, strings)
SentenceFinder
Detects sentence breaks.
SentimentAnalyzer
Lemmatizer + Chunker
Extracts positive/negative sentiments using a domain-specific resource (need customization for a specific domain).
en, fr, it
SnowballStemmer
Rule-based stemmer
da, du, en, es, fi, fr, de, hu, it, no, pt, ro, ru, sv, tu
SpellChecker
Performs spell check.
URLRemover
Removes URL from token streams.
WordDictionary
Matches from a dictionary.