XML Configuration Reference : Index : TextToNum
 
TextToNum
com.exalead.indexing.analysis.v10.TextToNum
Processor to hack an approximate sort on a text field. Implements a surjection from the set of strings to the set of integers [0..N] with N close but inferior or equal to 18,446,744,073,709,551,615 User defines an ordered alphabet. A first surjection from the set of all strings to the set of finite sequences of symbols taken from this alphabet is applied (strip the string from symbols out of the alphabet). A partial order relation is inferred on the latter set by the alphabet (lexicographical order). For obvious cardinal numbers reasons (one set is infinite the other is not), the second surjection cannot be partial-order preserving. The idea is to preserve the relation on the shorter strings, AND preserve the relation between shorter strings and longer strings, such as:
if STRING2ULONG('shortstring1') <= STRING2ULONG('shortstring2') then 'shortstring1' <= 'shortstring2'
STRING2ULONG('longstring1') <= STRING2ULONG('longstring2') does NOT insure 'longstring1' <= 'longstring2'
if STRING2ULONG('shortstring1') <= STRING2ULONG('longstring2') then 'shortstring1' <= 'longstring2'
The size of the prefix obviously depends on the size of the alphabet.
Parent elements:
com.exalead.indexing.analysis.v10.AnalysisPipeline (as AnalysisPipeline)
com.exalead.indexing.analysis.v10.DocumentProcessorGroup (as DocumentProcessorGroup)
Attributes:
Name
Type
Default value
Description
inputContext
string
The processor will only be applied to DocumentChunks with this ContextName.
name
string
Name of this processor. The name of a processor is used only for tracing and debugging purposes.
dataModelState
string
Is this document processor managed by a data model? @enum{null,auto,customized, error}.
If null, this document processor is not related to a data model.
If "auto", this document processor is auto-generated by a data model.
If "customized", this document processor was auto-generated by a data model and then customized.
If "error", there is a conflict between this document processor and the data model.
dataModelClass
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor.
dataModelProperty
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor.
disabled
boolean
Disable the DocumentProcessor
alphabet
string
0123456789abcdefghijklmnopqrstuvwxyz
The ordered alphabet.
outputContext
string
The ContextName used for the newly created chunk.
nbBits
int
63
Number of bits of unsigned field used for sorting.
Nested elements:
Name
Type
Description
fromDataModel
com.exalead.indexing.analysis.v10.DocumentProcessor
If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor
AcceptCondition
com.exalead.indexing.analysis.v10.AcceptCondition
Expresses the enablement condition of this DocumentProcessor.