Name | Dependencies | Description | Supported Languages |
---|---|---|---|
AcronymDetector | Detects acronyms. | ||
AnnotationManager | Provides basic operations on annotations (copy, removal, and so on). | ||
Categorizer | Machine learning classifier, categorizes the whole document according to a learning resource. | ||
Chunker | PartOfSpeechTagger | Detects subject/verb in a sentence. | en, fr, it |
FastRulesMatcher | Matches documents against rules. | ||
LanguageDetector | Detects language of tokens. This processor can detect language of small sentence and handle multi-languages documents. | over 100 languages | |
Lemmatizer | Identifies the lemma of each word using a language dictionary (no disambiguation). | de, en, es, fr, it, pt | |
NamedEntitiesMatcher | RelatedTerms | Detects named entities (people, organizations, places, events, emails, dates, currency, French addresses, urls, French phone numbers, French/English opening hours) | |
NGram | NGram extractor. | ||
OntologyMatcher | Extracts words/expressions defined in an ontology. | Depends on the ontology content. | |
PartOfSpeechTagger | Detects part of speech (noun/verb/adjective/...) for each token with disambiguation. | fr, it, en | |
Phonetizer | Phonetizes tokens. | ca, cs, da, de, en, es, et, fa, fi, fr, it, nl, no, pl, pt, ro, ru, sk, sl, sv | |
PrettyPrinter | Prints pretty tokens. | ||
Proximity | Annotates pieces of text where a number of annotations appear close to each other. | ||
RelatedTerms | PartOfSpeechTagger only if withPartOfSpeech=true (default value) | Extracts noun phrases from the tokens' stream. | ar, ca, cs, da, de, en, es, et, fa, fi, fr, he, it, ja, nl, no, pl, pt, ro, ru, sk, sl, sv, zh |
RulesMatcher | Extracts 'patterns' from the tokens' stream. | ||
SemanticExtractor | Extraction of semantic features (numbers, strings) | ||
SentenceFinder | Detects sentence breaks. | ||
SentimentAnalyzer | Lemmatizer + Chunker | Extracts positive/negative sentiments using a domain-specific resource (need customization for a specific domain). | en, fr, it |
SnowballStemmer | Rule-based stemmer | da, du, en, es, fi, fr, de, hu, it, no, pt, ro, ru, sv, tu | |
SpellChecker | Performs spell check. | ||
URLRemover | Removes URL from token streams. | ||
WordDictionary | Matches from a dictionary. |