Connectors : Default Connectors : Files Connector : Configure the Files Connector
 
Configure the Files Connector
 
Basic Configuration
Advanced Configuration - Crawl HDFS Example
The procedures below show two types of Files connector configuration either through basic parameters in the Configuration tab or more specific parameters in the Advanced tab.
Basic Configuration
The following procedure describes how to configure the Files connector basically, through the most common parameters available in the Configuration tab.
1. In Filesystem paths, enter the file system path to crawl.
Note: This can be the mount point on the local host of a remote file system.
2. Click Add path to add more paths.
3. To exclude a subfolder from being crawled, expand More options, and enter the file system path of this subfolder in Exclude rules.
Select Regexp if you want to use regular expressions in the path.
4. In Filename extensions, select what you want to crawl and index.
5. To make file names searchable for all documents that do not correspond to the Filename extensions list, select Index file names for the ignored extensions.
6. Click Apply.
Advanced Configuration - Crawl HDFS Example
You can also configure the files connector partially or entirely in the Advanced tab.
In Advanced > Root Paths > Item n > Connectivity configuration, you can find several types of Files connector advanced settings for HDFS, FTP, HTTP, and Remote Windows filesystems. This section gives the example of a Files connector configuration to crawl an HDFS server.
1. Optionally:
a. In File extension, add or remove file extensions.
b. In Max input size, change the maximum size allowed for documents.
2. In Root Paths > Item 0 > Root path, enter the first file system path you want to crawl recursively. Use the format hdfs://<HOST>:<PORT>/<PATH TO THE DIRECTORY>.
Note: This parameter is the same as the Configuration tab >Filesystem paths parameter.
3. Configure the HDFS server authentication if required.
a. Expand Connectivity configuration.
b. Click Add item
c. For Component class name, select HDFS advanced settings.
d. Expand Authentication and in Username and Password, specify the credentials of a user account allowed to crawl the HDFS filesystem.
4. To add more paths to crawl, go back to the Root Paths level, click Add item and repeat the two previous steps.
5. Click Apply.