XML Configuration Reference : Index : URLTransformer
 
URLTransformer
com.exalead.indexing.analysis.v10.URLTransformer
Parses a context string as a regular URL (RFC 2396, "Uniform Resource Identifier") and transforms it according to the given URL pattern. A new DocumentChunk is created with the substitution. Pattern used to transform the URL (in the form <scheme>://<authority><path>?<query>#<fragment>):
Characters other than '$' or '\' are kept as-is
The '$' character and the '\' character must be escaped with a leading \
The ${expression} form allows to compute a string expression based on URL components (see "Expression" below)
Expression used inside the enclosing ${}:
url: Original URL
scheme: Scheme name ("http", "https", "file", ...)
authority: Authority (host:port or host) (may be empty)
host: Hostname part of the authority (may be empty)
port: Port number part of the authority (may be empty)
userInfo: username:password field of the authority (may be empty)
file: File starting with / and query string, if any
pathurl: Normalized absolute path starting with /
path: Normalized absolute path (may start with C:\ on Windows)
query: Normalized query part starting with ? (may be empty)
args: Query part without the leading ? (may be empty)
fragment: Fragment part starting with #(may be empty)
reference: Reference part ; i.e., fragment without the leading # (may be empty)
arg:name: Query part argument identified by its name, unescaped (you must re-escape it using "urlencode:" when necessary)
str:string: The final argument is not a variable name, but a string (only useful for clarity purpose)
tolower:<i>expression</i>: Transform into lowercase (ONLY A-Z)
toupper:<i>expression</i>: Transform into uppercase (ONLY a-z)
urlencode:<i>expression</i> :URL encoding (%NN or +)
urlpathencode:expression</i>: URL encoding outside / fragments
urldecode:<i>expression</i>: URL decoding
pathslash:<i>expression</i>: Convert \ into /
pathantislash:<i>expression</i>: Convert / into \
Notes:
Unreserved characters are unescaped during URL processing (i.e., never '%' or '\')
The lower other similar prefix accept recursion (i.e., the expression "${urlpathencode:pathantislash:toupper:path}" is valid)
Both "file://C:\path" and "file:///C:\path" will produce path="/C:\path"
Examples:
With the input context value "http://www.example.com/bar/foo?bar=42"
"hello, world" => "hello, world"
"the scheme is ${scheme}" => "the scheme is http"
"the scheme is \${scheme}" => "the scheme is \${scheme}
"http://myserver${path}${query}" => "http://myserver/bar/foo?bar=42"
"http://myserver/applet?f=${urlpathencode:path}&t=${arg:bar}" => "http://myserver/applet?f=/bar/foo&t=42"
"http://myserver/applet?f=${urlencode:path}&t=${arg:bar}" => "http://myserver/applet?f=%2Fbar%2Ffoo&t=42"
"http://myserver/applet?f=${urlpathencode:pathantislash:toupper:path}" => "http://myserver/applet?f=%5CBAR%5CFOO"
With the input context value "file:///C:/My%20Documents/Document.doc"
"${pathantislash:urldecode:path}" => "C:\My Documents\Document.doc"
Parent elements:
com.exalead.indexing.analysis.v10.AnalysisPipeline (as AnalysisPipeline)
com.exalead.indexing.analysis.v10.DocumentProcessorGroup (as DocumentProcessorGroup)
Attributes:
Name
Type
Default value
Description
inputContext
string
The processor will only be applied to DocumentChunks with this ContextName.
name
string
Name of this processor. The name of a processor is used only for tracing and debugging purposes.
dataModelState
string
Is this document processor managed by a data model? @enum{null,auto,customized, error}.
If null, this document processor is not related to a data model.
If "auto", this document processor is auto-generated by a data model.
If "customized", this document processor was auto-generated by a data model and then customized.
If "error", there is a conflict between this document processor and the data model.
dataModelClass
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor.
dataModelProperty
string
If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor.
disabled
boolean
Disable the DocumentProcessor
outputContext
string
ContextName to be associated with the DocumentChunk created for each new context.
urlPattern
string
Pattern used to transform the URL.
Nested elements:
Name
Type
Description
fromDataModel
com.exalead.indexing.analysis.v10.DocumentProcessor
If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor
AcceptCondition
com.exalead.indexing.analysis.v10.AcceptCondition
Expresses the enablement condition of this DocumentProcessor.