XML Configuration Reference : Index : ReplaceRegexp
 
ReplaceRegexp
com.exalead.indexing.analysis.v10.ReplaceRegexp
Substitutes the content substring of all DocumentChunks having the ContextName 'inputContext', using:
'pattern' as the matching substring regular expression
and 'value' as the replacement value.
This value may have the form of sed output format using references to captures \0 through \9. A new DocumentChunk is created with the substitutions.
Parent elements:
com.exalead.indexing.analysis.v10.AnalysisPipeline (as AnalysisPipeline)
com.exalead.indexing.analysis.v10.DocumentProcessorGroup (as DocumentProcessorGroup)
Attributes:
Name
Type
Default value
Description
inputContext
string
The processor will only be applied to DocumentChunks with this ContextName.
name
string
Name of this processor. The name of a processor is used only for tracing and debugging purposes.
dataModelState
string
Is this document processor managed by a data model? @enum{null,auto,customized, error}.
If null, this document processor is not related to a data model.
If "auto", this document processor is auto-generated by a data model.
If "customized", this document processor was auto-generated by a data model and then customized.
If "error", there is a conflict between this document processor and the data model.
dataModelClass
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor.
dataModelProperty
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor.
disabled
boolean
Disable the DocumentProcessor
outputContext
string
ContextName to be associated with the DocumentChunk created for each new context.
pattern
string
Pattern used to match the substrings to replace. ASTL library is used to perform regular expression matching. The regular expression language supported is Perl 5, WITHOUT support for:
lazy (non-greedy) quantifiers like *?, +?, ??, {n}?, {n,}?, {n,m}?
possessive quantifiers like *+, ++, ?+, {n}+, {n,}+, {n,m}+
assertions like \b, \B, \A, \z, \Z, \G
look-around assertions (?=pattern), (?!pattern), (?<=pattern), (?<!pattern)
named captures (?'name'pattern), (?<name>pattern)
numeric and named backreferences like \1, \g1, g{-1}, \g{name}, k<name>, k'name'
named Unicode character \N{name}
all operators related to Perl code inlining like (?{ code })
all operators related to backtracking algorithm control like independent subexpression (?>pattern)
\C matching a single C char (octet)
of the pattern-match modifiers (?pimsx-imsx) only (?i:pattern) and (?i) are supported (no negative form)
value
string
The replacement value (sed-like output format).
replaceAll
boolean
True
Replaces all first occurrences of patterns.
Nested elements:
Name
Type
Description
fromDataModel
com.exalead.indexing.analysis.v10.DocumentProcessor
If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor
AcceptCondition
com.exalead.indexing.analysis.v10.AcceptCondition
Expresses the enablement condition of this DocumentProcessor.