XML Configuration Reference : Linguistic : StandardTokenizer
 
StandardTokenizer
com.exalead.linguistic.v10.StandardTokenizer
Sets a specific configuration for the standard tokenizer. If this object is not in the Semantic processors list, standard tokenization is used.
Parent elements:
com.exalead.linguistic.v10.TokenizationConfig (as TokenizationConfig)
Attributes:
Name
Type
Default value
Description
language
iso code
The language handled by this tokenizer. This can be null for all unhandled languages.
concatAlphaNum
boolean
True
Concat alpha and num chars
concatNumAlpha
boolean
True
Concat num and alpha chars
Nested elements:
Name
Type
Description
charOverrides
com.exalead.linguistic.v10.StandardTokenizerOverride*
Set of rules that allows the forcing of a specific character type.
patternOverrides
com.exalead.linguistic.v10.StandardTokenizerOverride*
Set of rules that allows the forcing of specific regexp type.
TokenizerPlugin
com.exalead.linguistic.v10.TokenizerPlugin*