Class | Description |
---|---|
AcronymDetector |
Acronym detector.
|
AcronymExtractor | |
AlphanumHandler | |
Anchorer |
Add an annotation of the first and last token processed
|
AnnotationManager |
The AnnotationManager is intended to gather all basic operations related to
annotation manipulation like copying, deduping, selecting leftmost-longest...
|
AnnotationManagerResource |
Resource for annotation management
|
ApproxMatcher |
The ApproxMatcher is used to fuzzy match expressions (at word level)
|
ApproxMatcherBuilder |
Build a resource for the
ApproxMatcher |
ApproxMatcherEntry |
An ApproxMatcher's entry
|
ApproxMatcherResource |
An ApproxMatcher's resource
|
BayesCategorizer |
A Bayes-based categorizer
|
BayesCategorizerBuilder |
Bayes categorizer resource builder
|
BayesCategorizerResource |
A bayes categorizer resource
|
Categorizer | |
CategorizerBuilder |
A categorizer resource builder
|
CharKindOverwrite | |
ChineseTokenizer |
Chinese tokenizer
|
ChineseTokenizerResource |
Chinese tokenizer resource
|
ChineseWordFinder |
Chinese word finder.
|
ChineseWordFinderResource |
Chinese word finder resource
|
Chunker |
A Chunker annotates grammatical chunks.
|
ChunkerResource |
A chunker resource
|
CJKProcessor |
The CJKProcessor split each Chinese/Japanese/Korean token into several token of size 1
|
CompoundWordSplitter | |
CoOccurrence |
A co-occurrence
|
CoOccurrenceIterator |
A CoOccurrenceIterator iterates over a cooccurrence resource
|
CoOccurrenceLookup |
A CoOccurrenceLookup is used to lookup in a co-occurence resource
|
CoOccurrenceProcessor |
Extract cooccurrences
|
CoOccurrenceResource |
A CoOccurrence resource
|
CRFCPGenerator |
The CRF content predicate generator.
|
CRFResource |
The conditional random field (CRF) resource
|
CRFTagger |
A CRF based POS tagger
|
CRFTrainer |
Trains a CRF resource.
|
Distance |
Provide damerau-lenvenshtein distance computation
|
DutchTokenizer |
Dutch tokenizer
|
DynamicLemmatizer |
A DynamicLemmatizer truncates end of words and provide suffix expansions according to static rules
|
DynamicLemmatizerBuilder |
Dynamic lemmatizer resource builder
|
DynamicLemmatizerResource |
A dynamic lemmatizer resource
|
FarTextAnnotator | |
FastRules |
A processor performing rule-based categorization
|
FastRulesBuilder |
Rule compiler for the FastRules processor
|
FastRulesResource |
FastRules matcher resource
|
FastTokenizer |
Tokenizer
|
FeaturesExtractorBuilder |
Features extractor resource builder
|
FeaturesExtractorResource |
Helpers on features extractor resources
|
FreqDForm |
Frequency and display form associated to a WordDictionary entry
|
FrequencyAnalyzer | |
FSMMatcher | |
GermanTokenizer |
German tokenizer
|
HierarchicalBayesCategorizer |
A hierarchical bayes categorizer
|
HierarchicalBayesCategorizerBuilder |
Hierarchical bayes categorizer builder
|
HierarchicalVocabularyBuilder | |
HierarchicalVocabularyClassifier | |
HierarchicalVocabularyResource | |
HMMResource |
Hidden Markov Model Resource
|
HMMTagger |
An HMM based part of speech tagger.
|
IdentityMatcher |
The IdentityMatcher is a fuzzy expression matcher allowing terms to be:
- missing
- added
- transposed
- modified
It can only work with a small amount of text, like a query, and not on a full document.
|
IdentityMatcher2 | |
IdentityMatcherAnnotationAtom |
An atom represented by an annotation
|
IdentityMatcherAtom |
The abstract representation of an atom
|
IdentityMatcherBuilder |
Identity Matcher resource builder
|
IdentityMatcherEntry |
An abstract IdentityMatcher entry
|
IdentityMatcherIterator |
Iterates over an IdentityMatcher resource
|
IdentityMatcherMatch |
An IdentityMatcher
|
IdentityMatcherResource |
The IdentityMatcher resource
|
IdentityMatcherResult |
An IdentityMatcher result
|
IdentityMatcherRule |
An IdentityMatcher rule
|
IdentityMatcherRules |
An IdentityMatcher list of rule
|
IdentityMatcherStringRule |
A string-based IdentityMatcher rule
|
IdentityMatcherWordAtom |
An IdentityMatcher word atom
|
Ignore | |
JapaneseCharDetector | |
JapaneseQueryTokenExpand |
The JapaneseQueryTokenExpand processor generates one annotation per written
form available (katakana, romaji, hiragana, kanji)
|
JapaneseTokenizer |
Japanese tokenizer
|
JapaneseTokenizerResource |
Japanese tokenizer resource
|
JapaneseWordFinder |
Japanese word finder.
|
JapaneseWordFinderResource |
Japanese word finder resource
|
JaZhBuilder |
Build new Japanese vs Chinese disambiguisation resource
|
JaZhDisambiguisator |
A Japanese vs Chinese disambiguisator
|
JaZhResource |
A Japanese vs Chinese disambiguisation resource
|
LangRange | |
LanguageDetector |
Language detection processor
|
LanguageDetectorBuilder |
Build a resource for the
LanguageDetector |
LanguageDetectorResource |
Language detector resource
|
LanguagesHelper | |
LemmaGender |
Gender of a lemma
|
LemmaInformation |
A decoded lemmatizer annotation
|
LemmaNumber |
Number of a lemma
|
LemmaPoS |
PoS of a lemma
|
Lemmatizer |
Lemmatizes the nouns and adjectives using a static linguistic resource
In query mode, it annotates the current token with all available forms
|
LemmatizerBuilder |
Build new lemmatizer resource
|
LemmatizerIterator |
Iterates over a lemmatizer resource
|
LemmatizerResource |
A lemmatizer resource
|
NamedEntitiesFilter |
A filter to plug behind named entities matcher
|
NamedEntitiesProcessor |
Named entities processor using machine learning
|
NamedEntitiesResourceBuilder | |
NativeFeaturesExtractor |
The features extractor native processor (normalize features extracted before)
|
NativeFeaturesExtractorResource |
Features extractor resource
|
NegationMatcher |
Negation matcher
|
NGram | |
NGramForm |
A ngram
|
Node | |
Normalizer |
A normalizer lowercases and normalizes (remove accent, use lowercases) tokens
|
NormalizerResource |
A normalizer resource
|
NormalizerResourceConfig |
A normalizer configuration
|
NorwegianTokenizer |
Norwegian tokenizer
|
ObjectFactory | |
OntologyBuilder |
Ontology builder
|
OntologyElement |
Ontology element
|
OntologyIterator |
Ontology iterator
|
OntologyLookup |
Ontology lookup object
|
OntologyMatcher |
Ontology matcher, matches expressions from an ontology
|
OntologyResource |
Ontology resource
|
PhonemeDForm | |
PhonetizerCompiler |
Phonetizer's resource builder
|
PhonetizerProcessor |
The Phonetizer add for each token its phonetic representation
|
PhonetizerResource |
A Phonetizer's resource
|
PrettyPrinter | |
Proximity |
A processor implementing a NEAR search
|
ProximityResource |
Proximity matcher resource
|
RegexpMatcherProcessor |
A processor matching a set of Perl 5 regular expressions.
|
RegexpMatcherResource |
Regular expressions matcher resource
|
RelatedTerm |
A related term
|
RelatedTerms |
Extract RelatedTerms using a previously added
RelatedTermsPreprocessor |
RelatedTermsDict |
A RelatedTermsDict extract related terms using an existing related terms dictionary
|
RelatedTermsDictionaryBuilder |
Build a RelatedTermsDict resource
|
RelatedTermsDictionaryIterator |
Iterate over a RelatedTermsDict resource
|
RelatedTermsDictionaryLookup |
Lookup in a RelatedTermsDict
|
RelatedTermsDictionaryMigrator | |
RelatedTermsDictResource |
A RelatedTermsDict resource
|
RelatedTermsEntry |
A RelatedTerms entry
|
SemanticExtractor |
A processor matching typed entities and interpreting these matches with
rewrite rules
|
SemanticExtractorBuilder |
Compiler for the SemanticExtractor rules
|
SentenceFinder |
A end-of-sentence detection processor.
|
SentenceFinderResource |
Resource for sentence detection
|
SentimentAnalyzer | |
SentimentSuggestion | |
SimHash |
Implementation of the M.
|
SimHashEntry |
A simhash entry for the repository.
|
SimHashNearestNeighbor |
SimHashNearestNeighbor
Online detection of near duplicate.
|
SimHashProcessor |
SimHashProcessor
Annotate documents with simhash value.
|
SortedWordDictionaryBuilder | |
SpellChecker |
Spell checker
|
SpellCheckerBuilder |
Spell checker resource builder
|
SpellCheckerResource |
Spell checker resource
|
SQI |
The Semantic Query Interpreter extract information and interpret it with rewrite rules
|
SQIBuilder |
Compiler for the Semantic Query Interpreter rules
|
SQIResource |
Resource for the Semantic Query Interpreter
|
StandAloneFastRules | |
StandAloneFastRulesResource | |
Stemmer |
Annotate words with their respective stem using rules
|
SubTokenizer |
Base subtokenizer
|
SubTokenizerResource |
Dutch, German, Norwegian subtokenizer resource
|
SubTokenizerResourceBuilder |
Dutch, German, Norwegian subtokenizer resource builder
|
SyntacticAnalyzer |
The syntactic analyzer detects grammatical structures with respect to a given format grammar
|
SyntacticAnalyzerResource | |
Tagger |
Defines the tag IDs for PoS taggers.
|
TermFrequencies | |
TokenizationConfig | |
TokenizationException | |
TokenizeAnnotation | |
TokenizerResource |
Tokenizer resource
|
TokensToAnnotation |
For a list of annotations tags, this processor annotates a token with the
annotation's matching tokens for each token.
|
Transducer |
A processor that matches patterns against a token stream.
|
TransducerResource |
Resource for the transducer
|
UrlRemover |
This processor removes or annotates text chunks that match a standard URL
pattern
|
VlHMMResource |
Variable length HMM resource.
|
VocabularyBuilder |
VocabularyBuilder
The vocabulary is build in three steps:
- Sort the token,classId
- Compute global frequency and score for each token and each class
- Sort the vocabulary by best discriminator (largest class score)
The score of a token W for a given class C is computed as follow:
score(W,C) = (frequency of W in class C) div sqrt(total number of documents in class C * total frequency of W).
|
VocabularyClassifier |
VocabularyClassifier.
|
VocabularyIterator |
VocabularyIterator.
|
VocabularyResource |
A resource for document classification.
|
WordDictionary |
A processor that matches a set of words
|
WordDictionaryBuilder |
Dictionary compiler for the WordDictionary
|
WordDictionaryEntry |
An entry in a WordDictionary
|
WordDictionaryIterator |
Iterator over entries of a WordDictionary.
|
WordDictionaryLookup |
Accessor for a WordDictionary
|
WordDictionaryResource |
Resource for WordDictionary
|
WordDictionaryWithMultipleDisplayForms |
Iterator over a WordDictionary with multiple display forms associated to each word
|
Enum | Description |
---|---|
AlphanumHandler.Mode |
Exception | Description |
---|---|
InvalidFastRuleException |
Copyright © 2013 Dassault Systèmes, All Rights Reserved.