Connectors : Default Connectors : Files Connector : About the Files Connector Configuration
 
About the Files Connector Configuration
 
Filesystem Paths
MS Windows Specificity for HDFS File System
Exclude/Include Rules
Allowed Extensions
Search Nonindexed File Names
This section implies that the connector has already been added. See Creating a Standard Connector.
Filesystem Paths
You can configure the absolute paths that Exalead CloudView must crawl in Filesystem paths.
Directory names end by backward slashes (\) for MS Windows filesystems, and by forward slashes (/) for UNIX platforms.
Note: Do not specify the path to a single file, only paths to directories are supported.
MS Windows Specificity for HDFS File System
In the <DATADIR>/config/CloudviewDeploymentInternalConfig.xml file, edit the ProcessInternalConfig node to add:
<StringValue xmlns="exa:exa.bee" value="-Dhadoop.home.dir=<path>"/>
where <path> is the path to a directory containing a \bin\winutils.exe file. The winutils.exe file does not have to be a valid executable file. Hadoop only checks its existence.
Exclude/Include Rules
You can specify the path that you do not want to crawl by adding an Exclude rule. Similarly, you can specify the path that you want to crawl by adding an Include rule.
If there is
then...
No Include rule and no Exclude rule
all documents are accepted for the specified filename extensions.
One or more Include rules
documents are accepted if at least one include rule matches and if no exclude rule matches.
One or more Exclude rules
documents are accepted if no exclude rule matches.
Note: If the connector is running on Windows:
Separate local paths with \, for example, D:\temp.
Separate remote or network paths with /, for example, //remote_host/temp.
Regular Expressions
You can use regular expressions (selecting regexp) to specify the path.
The rule matches any substring within the absolute path of the file or folder. The folder path ends with a file separator (\ or /). If regexp is selected, the expression has to be a valid regular expression. Otherwise, the expression is a substring.
For example, C:\\Users\\.*\\Documents matches the paths: C:\Users\smith\Documents and C:\Users\dupont\Documents.
Examples
Root path is C:\Documents and you do not want to index C:\Documents\Contracts\ except, C:\Documents\Contracts\Public
Possible solutions:
Add two Include rules:
1st rule is: C:\\Documents\\Contracts\\$ with regexp selected.
2nd rule is: C:\Documents\Contracts\Public with regexp NOT selected.
Add one Exclude rule:
C:\\Documents\Contracts\[^\\]+$ with regexp selected.
Allowed Extensions
Specify which extensions to allow or ignore in the Filename Extensions section.
Search Nonindexed File Names
By default, Exalead CloudView only crawls and indexes files specified in the Filename extensions list. This makes both their content and the filenames searchable. You can configure Exalead CloudView to search the filenames of nonindexed files.
For example, with Index file names set, users can search for executable files (.exe files) by their filename. Use this feature to find file types whose contents have little search value and that are not listed in Filename extensions.