Contains a list of document processors, which are executed only if this group document processor condition matches. Avoids condition duplication or distinct pipelines creation when several processors share the same condition.
Format Checker Date
The Format Checker Date processor checks that the chunk matches either:
• A custom input format defined with UNIX date syntax (for example, %Y/%m/%d-%H:%M:%S).
• One of the automatically recognized date formats.
Infer File Extension
When the file_extension meta is not present, find the file extension based in the file name or the mime meta (if one of these two is present).
Insert Current Date
Adds the current date in an output context.
Precomputed Thumbnails Document Processor
Precomputes thumbnails of the first DocumentPart.
Random DocumentChunks Generator (Uniform Distribution)
Adds a new DocumentChunk for one document out of 'modulo' documents processed.
The textual content of the DocumentChunk is picked out of the list specified in Values, with a uniform distribution.
Random DocumentChunks Generator (Zipf Distribution)
Adds a new document chunk for one document out of 'modulo'.
The textual content of the document chunk is picked out of the list specified in Values, with a nonuniform discrete Zipf distribution.
Real-Time Alerting
Matches queries defined by end-users and alerts them as soon as possible when a new matching document is indexed.
Semantic Pipe
Instantiates a semantic pipe and creates chunks out of resulting annotations.
It helps instantiate classification processors, and perform document level operations from their output.
Similar String to Part Converter
Converts the signatures in a string format from a meta to a binary part.
Storage Service Document Processor
Queries the storage for any meta to attach to the document.
Multivalued pairs are pushed as multivalued metas.
For example:
• The storage key "nb_comment" is attached as "nb_comment" meta on the document.
• The storage key "tags[]" i attached as "tags" multivalued meta on the document.
UTF8 Checker
Checks that the text passing through is valid UTF-8.
Emits a warning with the document URI and the context name if input is malformed.