Interface | Description |
---|---|
AcceptCondition.Transformer<T> | |
Classifier.Transformer<T> | |
Classifier.Visitor | |
DocumentProcessor.Transformer<T> | |
MultiContextDocumentProcessor.Transformer<T> | |
SemanticProcessor.Transformer<T> | |
SemanticProcessor.Visitor | |
SingleContextDocumentProcessor.Transformer<T> | |
UniformRandomContextGenerator.Transformer<T> |
Class | Description |
---|---|
AcceptCondition |
An AcceptCondition expresses a condition for a Document.
|
AcronymDetector |
Detects acronyms like 'o.n.u' and extracts 'onu'.
'.', '-' and ' ' are the standard acronym separators. Custom alphanumeric separators can be added with the "separators" attribute. |
AlertGroup |
No documentation for this element.
|
AnalysisConfig |
AnalysisConfig represents a self-contained module for Document Analysis.
AnalysisConfig is referenced by a BuildGroup. An analysis module defines a set of pipelines that are applied in this module. |
AnalysisConfigList |
The list of all analysis modules.
|
AnalysisPipeline |
A document analysis pipeline.
Each pipeline has an associated accept condition. |
Anchorer |
Adds an annotation on the first and last tokens of either a processed sequence (first/last) or a range defined by an annotation a (first_a/last_a)
|
AndCondition |
AndCondition matches if all children AcceptCondition match.
|
AnnotationManager |
An annotation manager implements basic operations on annotations: copy/removal/selection according
to a number of conditions like:
Removal of overlaping annotations
Selection of the most frequent annotations
Copy of an annotation unless block listed
|
AnnotationMapping |
Defines how SemanticAnnotations are used to populate index fields.
|
AnnotationMapping.FromDataModel | |
AnnotationTarget |
Target for the mapping of a SemanticAnnotation.
|
AnnotationTarget.FromDataModel | |
BinaryContentCondition |
A condition that matches if the FIRST document part binary content type matches the binary string.
Note: Conditions work on document but content is set per document part. BinaryContentCondition only tests the binary content of the first part, if present. |
BuildGroupCondition |
BuildGroupCondition matches if the current buildgroup matches 'name'.
|
CategoryAnnotationTarget |
CategoryAnnotationTarget is used to create a new category path inside an index category field, out of a SemanticAnnotation.
The category path is built by the concatenation of the 'categoryRoot' and the selected 'form' of the annotation. |
CategoryContentTarget |
CategoryContentTarget is used to map a DocumentChunk to a category.
A Category Path is created for each DocumentChunk processed. The textual content of the DocumentChunk is used to build a Category Path. 'indexField' should be a category field (usually called 'categories' or 'security'). |
CGRDocumentProcessor |
Calls convert to generate octrees.
|
Chunker |
A chunker detects noun groups.
|
Classifier |
A Classifier classifies a whole document according to the existing annotations on selected Document Chunks.
The annotations are matched against a learning resource. |
CompoundWordSplitter |
Annotates compound words that use CamelCase (like SearchServer) or underscores (like my_variable) to separate the root words.
|
ConcatValues |
Concatenates all textual content of DocumentChunks where ContextName matches 'inputContexts', and joins them with the 'join' string.
A single DocumentChunk with ContextName 'outputContext' is created as an output. |
ContentCleanup |
Analyzes each DocumentChunk and performs whitespace removal, 'Whitespaces' being defined by the Unicode specification.
This includes ' ' '\r' and '\n'. Input: All DocumentChunks associated with the specified 'inputContext' ContextNames. Output: Same as input |
ContentTarget |
A ContentTarget specifies how a DocumentChunk or semantic annotation is processed to populate the index.
|
ContextMapping |
ContextMapping specifies how DocumentChunks with a given ContextName are remapped to index fields and whether they are used to populate the dictionary.
|
ContextMapping.FromDataModel | |
ConvertTextExtractor |
This processor performs text content extraction for all MIME-types (300+ file formats are currently handled).
|
CoordinatesFormatter |
The Coordinates Formatter processor creates a normalized chunk for the latitude and longitude.
|
CopyContext |
Copies all DocumentChunks with 'inputContext' as ContextName, and creates new DocumentChunks with the same score, language and part but with
'outputContext' as ContextName. |
CustomContentTarget |
CustomerContentTarget defines indexing by a custom 'Index Kind'.
|
CustomDirectiveCondition |
A condition that matches if the document has the specified directive name, with an optional specific value.
|
CustomDocumentProcessor |
A Custom document processor allows you to plug in custom code packaged as a CVPlugin into the document processing pipeline.
|
CustomPublisher |
Custom publisher configuration
|
CustomPublisher.Config | |
CustomSemanticProcessor |
A custom semantic processor allows you to plug in custom code in the semantic pipeline.
|
DataModelClassCondition |
A condition that matches if the document has the corresponding DataModel.
|
DataModelClassResolver |
This processor takes the value of the "datamodel_class" papi directive to determine the DataModelClass of the document.
If this directive is not found, we assume this is the default class. If this is not the default class, all metas corresponding to an existing DataModelProperty are prefixed with the type of the class declaring the property (it may be a superclass of the class). For the processors following this processor in the pipeline, you must refer to the Data Model property by prefixing it with its class name. |
DateCategoryContentTarget |
CategoryContentTarget specific to date.
|
DateContentTarget |
DateContentTarget defines indexing a date.
|
DateFormatter |
If a document chunk matches either:
a custom input format defined with UNIX date syntax (for example,%Y/%m/%d-%H:%M:%S)
one of the automatically recognized date formats (click icon for more information)
the Date Formatter generates three additional document chunks, each with its own context name, using the following naming convention:
$inputContext$dateTimeOutputContext (Default format: %Y/%m/%d-%H:%M:%S)
$inputContext$dateOutputContext (Default format: %Y/%m/%d)
$inputContext$timeOutputContext (Default format: %H:%M:%S)
|
DebugCrashProcessor |
Causes crashes for debugging purpose
|
DebugProcessor |
Dumps all the DocumentChunks named after 'inputContexts' on Standard Output.
This provides a log of the 'Analysis' process. |
DebugSemanticProcessor |
Dumps all annotated tokens in the specified format on Standard Output, or in @c outputFile.
(Log of the 'Analysis' process) |
DecreaseRankOnAnnotation |
Allows you to decrease the ranking when some words are flagged by an annotation (part of speech, ontology, ...).
|
DictionaryTarget |
A DictionaryTarget specifies how a DocumentChunk or semantic annotation is processed to the dictionary.
|
DiscardDocument |
DEPRECATED.
|
DocumentProcessor |
Abstract class for a document processor.
|
DocumentProcessor.FromDataModel | |
DocumentProcessorGroup |
Contains a list of document processors, which are executed only if this group document processor condition matches.
It avoids condition duplication or distinct pipelines creation when several processors share the same condition. |
DoubleToLong |
Using this processor you can store floating point values into signed fields that can then be queried with the DoublePrefixHandler.
|
EnumFacetAnnotationTarget |
EnumFacetAnnotationTarget maps the annotations according to the specified EnumFacet.
|
EnumFacetContentTarget |
EnumFacetContentTarget maps the content according to the specified EnumFacet.
|
FarTextAnnotator |
A FarTextAnnotator annotates alphanumeric tokens with 'annotation' if they are farther than 'startOffset'
|
FastRulesMatcher |
Annotates a document using a set of XML rules, compiled for efficiency.
The rules are described with the query language using the AND, OR and NOT operators, as well as 'context' matching operators. The rules can also match whole chunks (and not just words) per regular expressions. |
FieldIndexingLimit |
Limits the number of words that can be retrieved from a given field.
|
FieldRetrievalLimit |
Limits the size of text that can be retrieved from a given field.
In some standard configuration, a FieldRetrievalLimit on the 'text' field is set to "maxLength=4096". This limits the size of the index on disk. |
FilenameMatchCondition |
A condition that matches if the FIRST document part Filename type matches the regexp.
Note: Conditions work on document but Filenames are set per document part. |
FilteringConfiguration |
Filters to apply to the words extracted from the semantic processors.
Words that do not satisfy these conditions will not be indexed. The filtered values are expressed by the number of unicode characters. |
FixedRangeNumericalPartitioning |
Matches numerical values in a range.
|
ForcedRangeNumericalPartitioning |
Transforms a numerical value into the text value associated to its matching range from a set of predetermined ranges specified in 'NumericalRange'.
|
FormatCheckerDate |
The FormatCheckDate processor checks the chunk matches either:
a custom input format defined with UNIX date syntax (for example,%Y/%m/%d-%H:%M:%S)
one of the automatically recognized date formats
|
GenerateAnnotationsForContext |
Forces a context to be processed by the SemanticProcessor pipeline and to process semantic annotations.
|
GeoBBoxProcessor |
The Geo BBox processor converts the input geometry from WKT to WKB
and compute its bouding box. |
GeoCategorizer |
A processor that categorizes geographic points given their inclusion in a GeoDomain.
|
GetAnalysisConfigList |
Gets the list of all analysis modules.
|
GetAnalysisConfigListByVersion |
Gets the list of all analysis modules, for a specific version of the configuration store.
|
GetMetaFinderMetaList |
Gets the list of metas discovered by the "Trace all metas" feature.
|
GetMetaFinderSourceList |
No documentation for this element.
|
GetMetaFinderValueList |
Returns the list of meta values found when analyzing documents.
The entry points are the name of the datamodel, the name of the meta for which we want to retrieve values, and a refreshCollection flag which forces list of values to be flushed. |
HierarchicalClassifier |
A Classifier classifies a whole document according to the existing annotations on selected Document Chunks.
The annotations are matched against a learning resource. |
HTMLCSSExtractor |
Extracts all text chunks annotated with a class or an id specified in
classes or ids , and duplicates them in context outputContext |
HTMLCSSExtractor.Classes | |
HTMLCSSExtractor.Ids | |
HTMLCSSSelector |
Deletes all text chunks that are not annotated with a class or an id specified in
classes or ids |
HTMLCSSSelector.Classes | |
HTMLCSSSelector.Ids | |
HTMLRelevantContentExtractor |
The HTMLRelevantContentExtractor extracts the most relevant parts of an HTML document.
Generally, the relevant part of an HTML document is the article on the middle of the page. |
HTMLRelevantContentExtractor.AnnotationsToCopy | |
HTMLRelevantContentExtractor.IdsAndClassesToIgnore | |
HTMLRelevantContentExtractor.IdsAndClassesToKeep | |
HTMLTableExtractor |
Extracts all HTML tables having minColumnsRequired < nb cols < maxColumnsRequired and duplicates them in context
newContextName |
IncreaseRankOnAnnotation |
Allows you to increase the ranking when some words are flagged by an annotation (part of speech, ontology, ...).
|
InferFileExtension |
When the file_extension meta is not present, finds the file extension based on the file name or the mime meta (if one of these two is present).
|
InsertCurrentDate |
Adds the current date in an output context
|
JavaDocumentProcessor |
Takes Java code either inline or from a file, and executes it on-the-fly.
For production mode, we recommend packaging your custom code as a Java Plugin (CVPlugin) and using the Custom Document Processor to call it. Plugins allow better packaging and source code maintenance. |
JavaProcessor | Deprecated |
JavaScriptProcessor | Deprecated |
LanguageConfiguration |
Configuration of the linguistic extraction for a given language.
|
LanguageDetector |
Language detection is performed using the text of all the DocumentChunks associated with the specified input ContextNames for which language was not already detected or specified.
The whole text of all these DocumentChunks is taken into account by a statistical algorithm that detects the language. This language is then set as the language for all specified chunks. |
LanguageSetter |
The language is set as the language for all the DocumentChunks associated with the specified input ContextNames.
For example, the language attribute of a DocumentChunk is used by semantic processing. The language is represented by its iso639-1 code: fr, en |
Lemmatizer |
Creates a lemmatized form for each word (nouns and adjectives only).
This processor is mostly used as a helper for other processors (like Ontology Matcher, or Semantic Extractor), which need to perform lemmatized matches. Annotations generated: "lemma": normalized lemmatized form of the word (singular/masculine) "lemma_lowercase": lemmatized form of the word (singular/masculine) "fsingular": normalized singular form of the word "fsingular_lowercase": singular form of the word "masculine": if the token is a masculine word "feminine": if the token is a feminine word "neuter": if the token is neuter "singular": if the word is singular "plural": if the word is plural "unnumbered": if the word is unnumbered "pos": the static Part of Speech |
MappingConfiguration |
Specifies how DocumentChunks and their SemanticAnnotations populate the index and the dictionary.
|
MathDocumentProcessor |
Performs mathematical operations on a numerical field.
|
Meta |
Meta-data.
|
MetaCondition |
MetaCondition matches if the Document contains a DocumentChunk whose meta name and value match the specified condition.
|
MetaFinder |
Keeps track of all document metas
|
MetaFinderMeta |
A meta discovered by the "Trace all metas" feature.
|
MetaFinderMetaList |
A list of meta discovered by the "Trace all metas" feature.
|
MetaFinderSourceFreq |
No documentation for this element.
|
MetaFinderSourceList |
No documentation for this element.
|
MetaFinderSourceList.Sources | |
MetaFinderValue |
No documentation for this element.
|
MetaFinderValueList |
No documentation for this element.
|
MimeCondition |
A condition that matches if the FIRST document part mime type is in the list.
Note: Conditions work on document but mimes are set per document part. The MimeCondition only tests the mime type of the first part, if present. |
MimeCondition.Mimes | |
MIMEDetector |
The MIME detector operates on each DocumentPart for which a MIME-type is not available.
The MIME-type can be specified for each DocumentPart in the PAPI. For DocumentPart, the 'bytes' and the 'filename' are used to guess the real MIME-type and charset. The guessed MIME-type and the charset are then set as attributes of the DocumentPart. Input: The DocumentPart of the document. Output: 'mime' and 'encodingToUse' attributes of DocumentParts. This document processor does not create any document chunks. |
MimeTypeSetter |
Manually sets the mime type
|
MultiContextCSVEncoder |
Creates a DocumentChunk containing the ContextName and the textual value of the DocumentChunks matching 'inputContexts'.
This processor can be used, for instance, to store arbitrary (key,value) pairs into one single index field. Note that this storing method is inefficient and should be used with caution. |
MultiContextDocumentProcessor |
A MultiContextDocumentProcessor processes all the DocumentChunks for each of the ContextNames specified as input.
|
MultiContextDocumentProcessor.InputContexts | |
NamedEntitiesMatcher |
The Named Entities Matcher detects named entities such as people, organizations, or places, in the textual content of the document.
It generates annotations like NE.person or NE.organization , using ontology-based matching and/or rule-based matching. |
NativeTextExtractor |
Extraction is performed for the following data types:
text/plain for Text files.
text/html for HTML Files.
application/x-exalead-document for CloudView 4.6 document format (com.exalead.document)
application/x-exalead-ndoc for CloudView 5 internal document format, binary.
application/x-exalead-ndoc-v10+xml for CloudView internal document format, XML.
|
NewChunk |
Creates a new DocumentChunk with 'outputContext' as ContextName, and textual content specified in 'value'.
|
NGramsExtractor |
Extracts normalized word-grams.
|
Normalizer |
Normalizes all tags given in input tags field.
|
NotCondition |
Matches if the child condition does not match.
If there is no child condition (null), this condition never matches. |
NumericalFormatter |
The Numerical Formatter processor creates valid numerical chunks from various number formats.
|
NumericalRange |
Associates text with a numerical range.
The range includes all values >= beg and <= end (beg <= x <= end). |
ObjectFactory | |
OntologyMatcher |
An OntologyMatcher detects concepts defined in an ontology in the textual content of the Document Chunks.
Typically, an ontology contains a list of business terms to be detected. |
OrCondition |
OrCondition matches if one child matches.
|
Part |
A document part.
|
Part.CustomDirectives | |
PartMapping |
PartMapping specifies how parts are remapped to index fields.
|
PartOfSpeechTagger |
A PartOfSpeechTagger detects the part of speech for each word in the text of Document Chunks.
|
PartTarget |
A PartTarget specifies how a Part is processed to populate the index.
|
Phonetizer |
Creates a phonetic form for each word.
This processor is used: as a helper for other processors (like Ontology Matcher, or Semantic Extractor), which need to perform phonetic matches. to perform search-time phonetic analysis using the Phonetic expansion module (this creates the dictionary of phonetic forms that will be used by the expansion module at search-time). to greatly improve the quality of spell checking. Annotations generated: "phonetic" |
PLMExpandDocumentProcessor |
Treat plm metas to generate octrees and matrices for PLMExpand.
|
PrecomputedThumbnailsDocumentProcessor |
The Precomputed Thumbnails Document Processor precomputes thumbnails of the first DocumentPart.
|
PrintfValues |
Prints textual content of DocumentChunks according to a formatting string.
This string contains variables in one of the 3 following formats: 1. |
ProximityProcessor |
A proximity processor detects and annotates pieces of text where several annotations occur given distance constraints.
|
PublicUrlProcessor |
For each input DocumentChunk associated with the 'inputContext' ContextName, 4 DocumentChunks are created, each associated with a different ContextName:
'treeOutputContext'
'leafOutputContext'
'urlOutputContext'
'urlCategoryOutputContext'
|
QueryList |
A list of queries processed by the QueryMatcher.
|
RankOnAnnotation |
Modifies ranking when some words are flagged by a given annotation.
|
RealTimeAlerting |
The Real-time alerting document processor matches queries defined by end-users and alerts them as soon as possible a new matching document is indexed.
|
RealTimeAlerting.AlertGroups | |
RealTimeAlerting.CustomPublishers | |
RelatedTerms |
Extracts all possible related terms.
Only one instance of this processor may exist per input context. |
RemoteHTTPTransformer |
The processor posts part bytes to the remote HTTP service, and gets the typed resource as a result.
The remote service may return a Document.MIME_V10 document, or any other document that can later be processed in the pipeline. If the remote service returns a non "OK" HTTP status (!= 200 error code), the corresponding error is passed as a regular error. The service may also advertise a filename, using the standard Content-Disposition's 'filename' attribute. |
RemoteHTTPTransformer.ArgMapping | |
RemoteHTTPTransformerRemoteArgMapping |
Transformation
RemoteHTTPTransformer argument mapping. |
RemoteMOTAPIDocumentProcessor |
The processing of each input context will be handled by the targeted remote API.
|
RemoteMOTAPIDocumentProcessor.TargetInstances | |
RemoveContexts | |
RenameContext |
Each DocumentChunk with ContextName matching 'inputContext' is renamed with a ContextName 'outputContext'.
|
RenameUnmappedContexts |
This Document Processor changes the ContextName for all DocumentChunks associated with a ContextName that does not have a Mapping Configuration.
|
ReplaceContextNames |
Replaces the first matching substring of context names with the given replacement.
For example, inputSubstring="abc" and outputReplacement="bar" will rename context abcdef to bardef and somethingabcstuff to somethingbarstuff |
ReplaceRegexp |
Substitutes the content substring of all DocumentChunks having the ContextName 'inputContext', using:
'pattern' as the matching substring regular expression
and 'value' as the replacement value.
This value may have the form of sed output format using references to captures \0 through \9.
|
ReplaceValues |
The ReplaceValues processor compares all DocumentChunks for a given inputContext with the specified KeyValue map.
|
ResetMetaFinderData |
No documentation for this element.
|
RulesMatcher |
A RuleMatcher applies a rule engine on the textual content of the DocumentChunks.
The rules are defined in a separate XML 'resourceFile' and are a combination of regular expression, word matching and boolean operators over content. |
SemanticExtractor |
The resource describes the features to extract, with their term, type and range for numerical values according to a set of rules.
|
SemanticPipeDocumentProcessor |
Instantiates a semantic pipe and creates chunks out of resulting annotations.
It can be used to instantiate classification processors, and perform document level operations from their output. |
SemanticProcessor |
A SemanticProcessor applies semantic processing on the textual content of the DocumentChunks.
A Semantic Processor creates SemanticAnnotations on tokens. These SemanticAnnotations can then be used in the Mapping. |
SemanticProcessor.FromDataModel | |
SentimentAnalyzer |
Analyzes the nouns and adjectives present in the text.
|
SetAnalysisConfigList |
Sets the list of all analysis modules.
|
SetDefaultValue |
This processor looks for specified contexts.
|
SimilarStringToPart |
Converts the signatures in a string format from a meta to a binary part
|
SimilarStringToPart.Values | |
SingleContextDocumentProcessor |
A SingleContextDocumentProcessor processes all the DocumentChunks with the ContextName specified as input.
|
SnowballStemmer |
Creates the stemmed form of each word.
|
SourceCondition |
SourceCondition matches if the source of the document matches 'source'.
|
SplitValues |
Splits the content of all DocumentChunks associated with the ContextName 'inputContext' using 'separator' as a separator regular expression.
A new DocumentChunk is created for each segment, with 'outputContext' as the ContextName. |
SQI | Deprecated |
StandardAnnotationTarget |
StandardAnnotationTarget is used to index the textual content of a SemanticAnnotation.
The selected 'form' of the SemanticAnnotation is used to populate an index field. |
StandardContentTarget |
A StandardContentTarget is used to populate a textual, numerical or date index field, with the content of a DocumentChunk.
|
StandardPartsMerger |
This processor does nothing if there are no DocumentParts (only root DocumentChunks).
|
StandardPartsMerger.PartSpecificContexts | |
StorageServiceDocumentProcessor |
Queries the storage for any meta to attach to the document.
Multi-valued pairs are pushed as multi-valued metas. For example: The storage key "nb_comment" will be attached as "nb_comment" meta on the document. The storage key "tags[]" will be attached as "tags" multi-valued meta on the document. |
StringHash |
The StringHash processor computes a signed hash of the textual input value.
For example, this value can be used in a field used for grouping. |
StringHash32 |
The StringHash processor computes a signed hash of the textual input value on 32 bits.
For example, this value can be used in a field used for grouping. |
StringHash64 |
The StringHash processor computes a signed hash of the textual input value on 64 bits.
For example, this value can be used in a field used for grouping. |
StringTransform |
Applies textual transformations on chunks from several contexts:
trims blanks at the beginning and end of chunks
reduces sequences of blanks to just one
changes text to uppercase/lowercase/normalized/capitalized
Outputs replace inputs.
|
Target |
A Target specifies how a DocumentChunk or semantic annotation is processed to populate the index or the dictionary.
|
TestAnalysisPipeline |
Test an analysis pipeline.
This allows you to test a pipeline using a test document and capture the output. |
TestAnalysisPipelineOutput |
No documentation for this element.
|
TestAnalysisPipelineOutput.DocumentProcessorsOutput | |
TestAnalysisPipelineOutput.UnmappedContexts | |
TextToNum |
Processor to hack an approximate sort on a text field.
Implements a surjection from the set of strings to the set of integers [0..N] with N close but inferior or equal to 18,446,744,073,709,551,615 User defines an ordered alphabet. |
UniformRandomContextGenerator |
Adds a new DocumentChunk for one document out of 'modulo' documents processed.
The textual content of the DocumentChunk is picked out of the list specified in 'values', with a uniform distribution. |
UniformRandomContextGenerator.Values | |
UnitsOfMeasurementNormalizer |
Unit of measurement detector and convertor
|
URLCodec |
URL encode/decode with UTF-8 charset only
|
URLMatchCondition |
A condition that matches if the URI matches the regexp.
|
URLTransformer |
Parses a context string as a regular URL (RFC 2396, "Uniform Resource Identifier") and transforms it according to the given URL pattern.
A new DocumentChunk is created with the substitution. Pattern used to transform the URL (in the form <scheme>://<authority><path>?<query>#<fragment>): Characters other than '$' or '\' are kept as-is The '$' character and the '\' character must be escaped with a leading \ The ${expression} form allows to compute a string expression based on URL components (see "Expression" below) Expression used inside the enclosing ${}: url: Original URL scheme: Scheme name ("http", "https", "file", ...) authority: Authority (host:port or host) (may be empty) host: Hostname part of the authority (may be empty) port: Port number part of the authority (may be empty) userInfo: username:password field of the authority (may be empty) file: File starting with / and query string, if any pathurl: Normalized absolute path starting with / path: Normalized absolute path (may start with C:\ on Windows) query: Normalized query part starting with ? (may be empty) args: Query part without the leading ? (may be empty) fragment: Fragment part starting with #(may be empty) reference: Reference part ; i.e., fragment without the leading # (may be empty) arg:name: Query part argument identified by its name, unescaped (you must re-escape it using "urlencode:" when necessary) str:string: The final argument is not a variable name, but a string (only useful for clarity purpose) tolower:expression: Transform into lowercase (ONLY A-Z) toupper:expression: Transform into uppercase (ONLY a-z) urlencode:expression :URL encoding (%NN or +) urlpathencode:expression: URL encoding outside / fragments urldecode:expression: URL decoding pathslash:expression: Convert \ into / pathantislash:expression: Convert / into \ Notes: Unreserved characters are unescaped during URL processing (i.e., never '%' or '\') The lower other similar prefix accept recursion (i.e., the expression "${urlpathencode:pathantislash:toupper:path}" is valid) Both "file://C:\path" and "file:///C:\path" will produce path="/C:\path" Examples: With the input context value "http://www.example.com/bar/foo?bar=42" "hello, world" => "hello, world" "the scheme is ${scheme}" => "the scheme is http" "the scheme is \${scheme}" => "the scheme is \${scheme} "http://myserver${path}${query}" => "http://myserver/bar/foo?bar=42" "http://myserver/applet?f=${urlpathencode:path}&t=${arg:bar}" => "http://myserver/applet?f=/bar/foo&t=42" "http://myserver/applet?f=${urlencode:path}&t=${arg:bar}" => "http://myserver/applet?f=%2Fbar%2Ffoo&t=42" "http://myserver/applet?f=${urlpathencode:pathantislash:toupper:path}" => "http://myserver/applet?f=%5CBAR%5CFOO" With the input context value "file:///C:/My%20Documents/Document.doc" "${pathantislash:urldecode:path}" => "C:\My Documents\Document.doc" |
UTF8Checker |
Checks that the text passing through is valid UTF-8. Emits a warning with the document URI and the context name if input is malformed. Optionally deletes invalid chunks. |
ValueSelector |
Takes the input contexts in the specified order, and as soon as one is found, it copies the content to the output context and stops.
|
WildcardIndexing |
Computes the input chunk substring to perform efficient prefix/substring/suffix search
|
WordCountMapping |
Specify where to map Word count.
|
XpathExtractor |
Extraction is performed for the following data types:
text/html.
|
XpathFragmentExtractor |
Input: All DocumentChunks associated with the specified 'inputContext' ContextNames.
|
XpathFragmentRule |
No documentation for this element.
|
XpathRule |
No documentation for this element.
|
ZipfRandomContextGenerator |
Adds a new document chunk for one document out of 'modulo'.
The textual content of the document chunk is picked out of the list specified in 'values', with a non-uniform discrete Zipf distribution. |
Copyright © 2021 Dassault Systèmes, All Rights Reserved.