XML Configuration Reference : Linguistic : TokenizationConfig
 
TokenizationConfig
com.exalead.linguistic.v10.TokenizationConfig
How to tokenize documents, ie split the input strings in tokens. Tokens are usually words.
Parent elements:
com.exalead.linguistic.v10.LinguisticConfig (as LinguisticConfig)
Attributes:
Name
Type
Default value
Description
name
string
The tokenization configuration's name.
Nested elements:
Name
Type
Description
FormIndexingConfig
com.exalead.linguistic.v10.FormIndexingConfig
How to index different word forms (exact, normalized, lemmatized) for each language.
NormalizerConfig
com.exalead.linguistic.v10.NormalizerConfig
How to normalize words.
Tokenizer
com.exalead.linguistic.v10.Tokenizer*
List of enabled tokenizers. Tokenizer choice depends on the document's language.