Connectors : Default Connectors : Files Connector : Advanced Configuration Parameters
 
Advanced Configuration Parameters
This section describes the Advanced tab parameters.
Parameter
Description
File extensions
This is the text version of the Configuration tab Filename extensions section.
Recursive
Indexes sub-folders recursively. If unchecked, only the files in the defined top root paths will be indexed. Enabled by default.
Enable ACL handling
Fetches security tokens associated with files.
On Unix, it will fetch group/user security mode and, if available, POSIX ACLs.
On Windows, it will fetch security SID.
Keep local ACL
Only applies to Windows, and if Enable ACL handling is enabled.
Fetches all security SID, including well-known local security SID such as "Local System"
Skip directory symlinks
Only applies to Unix/Linux.
Skips symbolic links to directories (do not follow them) to avoid possible infinite loops.
Default text encoding
If specified, defines a global default encoding for text files on this connector. This encoding may be used to index raw text files whose encoding is unknown.
Enable containers support
If specified, files which are containers (i.e., ZIP files, TAR files, PST files, EML files, etc.) will be processed as if they were regular folders.
Max. container depth
When containers support is enabled, sets the maximum recursive depth inside containers.
Example:
A level of 1 will only allow file scanning within containers in the filesystem source.
A level of 2 will also allow to scan containers inside containers (a ZIP file in a ZIP file, for example) in the filesystem source.
A level of 3 will allow one further depth (for example, an attachment inside a mail inside a PST file).
Max. documents per container
When containers support is enabled, set the maximum number of files to be processed inside a single container (inside a ZIP file, for example).
For example, considering the following structure:
foo.zip: a ZIP containing 80 files, and 10 ZIP files:
file1.doc
file2.doc
...
file80.doc
archive1.zip: a ZIP containing 50 files
archive2.zip: a ZIP containing 50 files
...
archive10.zip: a ZIP containing 50 files
Setting this value to "100" will allow to index all 80 files within foo.zip, and all 50 files within archive1.zip, all 50 files within archive2.zip, etc. The total number of files indexed will be equal to 580 (80 files at top level, and 50 files for each 10 archives).
Max. documents per container total
When containers support is enabled, set the maximum number of files to be processed overall, in all recursed container depth.
In the previous example, setting this value to "100", will allow to index all 80 files within foo.zip, but the indexing will stop after indexing 20 files within archive1.zip file. Other archives will not be indexed at all.
CPath stop MIME filter
Define the MIME types of containers which are to be considered as documents as a whole. For example, msg or eml mail files are containers, because they may contain attachments or attached files themselves.
Note: If this parameter is empty, no restriction or exclusion is applied.
Container MIME filter
Select the MIME types of files which are to be considered as containers.
Note: If this parameter is empty, no restriction or exclusion is applied.
Item MIME Filter
Select the MIME types of files to be scanned in a container.
Note: If this parameter is empty, no restriction or exclusion is applied.
Item extensions
Define the extensions of files to be scanned in a container.
Index names
Push empty documents for all the files which have not been accepted because of filters. This allows to index filenames of files whose content should not be indexed.
Max. input size
Maximum file input size allowed.
Specify any SI byte unit (1000KB, 100MB, 1GB and so on). If no unit is specified, it uses bytes.
Max. container fetch size
Maximum container size allowed for fetch (preview, data fetch).
Specify any SI byte unit (1000KB, 100MB, 1GB and so on). If no unit is specified, it uses bytes.
Convert address
External Convert address. Should be empty to dispatch to default Converter.
Container timeout
When opening a container using a remote Convert service, define the timeout when opening the file. For example, a large PST file may take several minutes to be opened.
Container fetch timeout
When opening a container using a remote Convert service, define the timeout when fetching a sub-item.
Truncate files pattern
When a file is larger than the allowed size set in Max. input size, truncate the file rather than discarding it. This option is compatible only with raw text files, or HTML (not Office files or PDF, for example).
Push folders as documents
Push an empty document for all folders found. Disabled by default.
Never send delete
Never send any delete remotely, even if the file is no longer present locally. Disabled by default.
Delete document on error
Define the strategy to be adopted when a document cannot be updated after a first indexing (if the file become unreadable, busy, or the access rights do not allow to access it anymore).
Keep: keep the entry in the remote index as it was before
Delete: remove the entry in the remote index
Empty: create an empty file in the remote index
Max. document queued
Maximum number of documents to be added in the document processing queue (in memory).
Max. folder queued
Maximum number of folders to be added in the folder processing queue.
No. pipeline document thread
Number of background threads processing the document queue, that is, reading documents to be indexed and sending them to the remote server.
No. pipeline folder thread
Number of background threads processing the folder queue, that is, scanning locally folders to find all files and subfolders to be indexed.
Max. processing size
Limits the total amount of memory which can be used when processing the document queue. If the limit is reached, other document threads will be blocked until the memory is free.
Root Paths (N)
Text version of the Configuration tab Filesystem paths
Filename include rules (N)
Text version of the Configuration tab Include rules
Filename exclude rules (N)
Text version of the Configuration tab Exclude rules
Main part MIME filters (N)
Used to aggregate and dedup items within a mail container. For example, this allows to index the HTML part of a mail, and ignore the text part.
Parent MIME filter: list of MIME filters of mail containers
Main part MIME filter: list of MIME types of body part(s) inside a mail
Main part dedup MIME Fiter: list of equivalent MIME types to be deduped
Main part dedup max. count: maximum number of documents to be deduped
Add child links: adds meta-data linking sub-child (such as attachments)
Merge in parent: merges bodies in the document
Merge container metas: merges container's metadata in main document
Filename MIME rules (N)
A set of rules allowing to set the MIME type, and optionally the encoding, of files matching the given extension/filename filter.
Filter: the space-separated list of filename extensions matching (or the regular expression, if the checkbox is checked)
Regular expression: if checked, the filter is a regular expression matching the filename
MIME type: the MIME type to set
Encoding: the encoding to set, optionally
Hint only: if checked, the MIME type is not forced
PushAPI filters (N)
The PushAPI pipeline configuration. Documents being added in the PushAPI pipeline will go through defined filters, starting by the first filter defined, until the last one, before being injected to the PushAPI.