Name | Type | Default value | Description |
---|---|---|---|
name | string | Name of this processor. The name of a processor is used only for tracing and debugging purposes. | |
dataModelState | string | Is this document processor managed by a data model? @enum{null,auto,customized, error}. • If null, this document processor is not related to a data model. • If "auto", this document processor is auto-generated by a data model. • If "customized", this document processor was auto-generated by a data model and then customized. • If "error", there is a conflict between this document processor and the data model. | |
dataModelClass | string | If dataModelState is either "auto" or "customized", you will find here the name of the DataModelClass that generated this DocumentProcessor. | |
dataModelProperty | string | If dataModelState is either "auto" or "customized", you will find here the name of the DataModelProperty that generated this DocumentProcessor. | |
disabled | boolean | Disable the DocumentProcessor | |
looseTextDetection | boolean | True | Looses text detection to detect more text files, including suspicious ones (not *.txt or *.html) ("true", "false") |
forceContent | boolean | Forces to accept the content, even if the MIME type does not seem to be a known or supported MIME type. | |
minInputSizeKB | long | -1 | Minimum document size accepted, in kilobytes. |
maxInputSizeKB | long | -1 | Maximum document size accepted, in kilobytes. |
maxRecursionDepth | int | -1 | Maximum recursion depth. |
maxRecursionDocuments | int | -1 | Maximum number of documents that can be converted in one directory level. |
maxRecursionDocumentsTotal | int | -1 | Maximum number of documents that can be converted over all levels. |
strictSizeCheck | boolean | Strict size validation mode (even for partial reads). | |
retryIO | string | Uses regular I/O when mmap fails. ("true", "false") | |
filter | string | Native filter identifier list to be used specifically. The list is a comma-separated (,) list of filter identifiers with optional ending argument(s) separated by semi-colons (;). If the filter identifier is prefixed by '!', the corresponding filter will be explicitly excluded. The special filter identifier '*' stands for "all other filters". First match wins: "*,!doc" is identical to "*". For example: filter="!jpeg,*" will accept all filters but the jpeg filter. | |
timeoutMs | long | -1 | Conversion timeout value, in milliseconds. If the conversion process takes longer, the remote side attemps to abort the conversion process. |
priority | string | Worker thread(s) priority to be used for the processing ("normal", "lowest", "very low", "low", "normal", "high", "very high") | |
embedded | string | Includes embedded images ("true", "false", "optional") | |
attachments | string | Includes embedded attachments ("true", "false", "optional") | |
styles | string | Attempts to extract more text styles for HTML conversion ("true", "false", "optional") | |
forceConversion | boolean | Attempts to generate an empty document upon conversion error (may be ignored) | |
startPage | long | -1 | Starts conversion from this page number (page number starts at 1). This parameter is only taken into account for image processing and may be ignored. |
maxPages | long | -1 | Maximum number of pages to process for xml conversion (may be ignored). |
maxOutputSizeKB | long | -1 | Maximum output size on the remote side, in kilobytes. If the generated output exceeds this value, the document may be truncated or invalid. |
allowUnicode32 | boolean | Allows the use of 32-bit unicode points. | |
allowDocumentChars | boolean | Allows the use of Unicode private range characters (E0XX) for separators (keyword, sentence, paragraph separators, ...) | |
outsideIn | string | This feature is no longer supported. ("true", "false", "optional") | |
outsideInFallback | string | This feature is no longer supported. ("true", "false", "optional") | |
outsideInOnly | string | This feature is no longer supported. ("true", "false", "optional") | |
outsideInForPreview | string | This feature is no longer supported. ("true", "false", "optional") | |
outsideInSimpleXHTMLFallback | string | This feature is no longer supported. ("true", "false", "optional") | |
ocr | string | Converts using OCR ("true", "false", "optional") | |
ocrFallback | string | Fallback to OCR if heuristics deem it necessary ("true", "false", "optional") | |
ocrDetect | string | Detects documents requiring OCR (and rejects them) ("true", "false") | |
ocrQuality | string | OCR quality ("fast", "normal", "best") | |
ocrLang | string | OCR language(s) ("en" for English, "en;fr" for French and English, etc.) | |
ocrTimeoutMs | long | -1 | OCR conversion timeout value, in milliseconds. If the OCR process takes longer, the remote side attemps to abort the conversion process. This value overrides the timeout value if the processing involves an OCR operation. |
ocrMaxPages | int | -1 | Maximum number of pages to process for OCR. |
ocrPriority | string | Worker thread(s) priority to be used for the OCR processing ("normal", "lowest", "very low", "low", "normal", "high", "very high") | |
httpProxyUrl | string | Optional HTTP proxy URL. The URL can embed credentials if required. | |
disablePlugins | boolean | Disables external plugins. | |
overrideAddresses | string |
Name | Type | Description |
---|---|---|
fromDataModel | com.exalead.indexing.analysis.v10.DocumentProcessor | If dataModelState is "customized", you will find here the original document processor generated by the data model. Use this to easily revert to "auto" state from "customized". @IgnoreForValueConstructor |
AcceptCondition | com.exalead.indexing.analysis.v10.AcceptCondition | Expresses the enablement condition of this DocumentProcessor. |
KeyValue | exa.bee.KeyValue* |