Available Processors

Name	Dependencies	Description	Supported Languages
AcronymDetector		Detects acronyms.
AnnotationManager		Provides basic operations on annotations (copy, removal, and so on).
Categorizer		Machine learning classifier, categorizes the whole document according to a learning resource.
Chunker	PartOfSpeechTagger	Detects subject/verb in a sentence.	en, fr, it
FastRulesMatcher		Matches documents against rules.
LanguageDetector		Detects language of tokens. This processor can detect language of small sentence and handle multi-languages documents.	over 100 languages
Lemmatizer		Identifies the lemma of each word using a language dictionary (no disambiguation).	de, en, es, fr, it, pt
NamedEntitiesMatcher	RelatedTerms	Detects named entities (people, organizations, places, events, emails, dates, currency, French addresses, urls, French phone numbers, French/English opening hours)
NGram		NGram extractor.
OntologyMatcher		Extracts words/expressions defined in an ontology.	Depends on the ontology content.
PartOfSpeechTagger		Detects part of speech (noun/verb/adjective/...) for each token with disambiguation.	fr, it, en
Phonetizer		Phonetizes tokens.	ca, cs, da, de, en, es, et, fa, fi, fr, it, nl, no, pl, pt, ro, ru, sk, sl, sv
PrettyPrinter		Prints pretty tokens.
Proximity		Annotates pieces of text where a number of annotations appear close to each other.
RelatedTerms	PartOfSpeechTagger only if withPartOfSpeech=true (default value)	Extracts noun phrases from the tokens' stream.	ar, ca, cs, da, de, en, es, et, fa, fi, fr, he, it, ja, nl, no, pl, pt, ro, ru, sk, sl, sv, zh
RulesMatcher		Extracts 'patterns' from the tokens' stream.
SemanticExtractor		Extraction of semantic features (numbers, strings)
SentenceFinder		Detects sentence breaks.
SentimentAnalyzer	Lemmatizer + Chunker	Extracts positive/negative sentiments using a domain-specific resource (need customization for a specific domain).	en, fr, it
SnowballStemmer		Rule-based stemmer	da, du, en, es, fi, fr, de, hu, it, no, pt, ro, ru, sv, tu
SpellChecker		Performs spell check.
URLRemover		Removes URL from token streams.
WordDictionary		Matches from a dictionary.