Connectors : Default Connectors : Files Connector : File Server Access and Security
 
File Server Access and Security
 
Behavior
Limitations
Maximum Path Length
Local File Server Access
Remote File Server Access
FTP Server Access
HDFS Server Access
Security
You add a Files connector when you want to crawl:
A local filesystem.
A remote filesystem that is shared on the network.
An HDFS file system.
The following topics are described below:
Behavior
The behavior of the Files connector is dependent on the platform on which you install Exalead CloudView.
There are differences between UNIX and Windows platforms, as explained in File Server Access and Security. In all cases, you can:
Configure several paths to crawl.
Exclude paths.
Crawl based on file name extensions.
Limitations
Consider the following limitations when implementing a Files connector:
Windows servers have some limitations handling multiple connections to the same fileshare from different windows sessions. On Windows 2003 Server machines, this is configurable and depends on the Microsoft Windows license.
Exalead CloudView cannot access remote files on a UNIX platform if they are mounted with the option directio. When using this option, the Files connector cannot index files within this filesystem. This is the case for CIFS (Common internet File System) mount points, as directio is defined as the default setting. Define the nodirectio option explicitly for that case. For more information, see the Linux man page for the UNIX command: mount.cifs(8).
Maximum Path Length
The maximum length for a path is defined as 260 characters for versions before Windows 10. You can override this limit and specify an extended-length path:
Use the \\?\ prefix for paths starting with a drive letter. Example: \\?\D:\my_very_long_path
Use the \\?\UNC\ prefix for paths starting with a server name. Example: \\?\UNC\server\share
Local File Server Access
When you create a Files connector, you must ensure that Exalead CloudView has the required privileges to crawl the filesystem.
If the Files connector is installed on a UNIX server, the account privileges of Exalead CloudView are used to crawl the filesystem.
If the Files connector is installed on a Windows server, the account privileges of Exalead CloudView are also used to crawl the filesystem by default. However, Exalead CloudView often runs using the Local System user account, which has the same privileges as the administrator on local filesystems but no access to network filesystems. Therefore, if you want the Exalead CloudView user to have privileges to access network files, configure it to run with a domain user account during installation.
Remote File Server Access
If the Files connector is not installed on the file server, it must access the connector's files remotely. Your network administrator must ensure that the filesystem to index is shared on the network and you have sufficient privileges.
For MS Windows
If the Files connector is installed on a Windows platform, you can crawl a UNIX remote filesystem if it is shared on the network.
You can share filesystems on the network using SAMBA on the UNIX machine, export the corresponding share to the Windows machine, and then mount this share as a regular Windows share.
Warning:  
A CIFS support is provided by a third-party library. As it does not handle newer versions (only the SMB1 version of the protocol is supported), we do not recommend its use. Moreover, the next Exalead CloudView versions may no longer support it. The third-party CIFS handler is used if the URL starts with one the following conditions:
smb://
\\ and an authentication is declared.
\\ and the OS is NOT MS Windows.
The Exalead CloudView user account is used to crawl the filesystem by default. Verify that this account has the required permissions to crawl the remote filesystem. The account must be valid for the domain.
If you need to define other authentication parameters, go to Advanced > Root Paths > Item n > Connectivity configuration > Remote Windows filesystem advanced settings > Authentication but remind that it only works reliably when DFS is involved (Distributed File System mount point).
For UNIX
If the Files connector is installed on a UNIX server, you can access shared filesystems on the network using the system's own CIFS mount feature.
On Linux use the mount.cifs command or the corresponding /etc/fstab entry.
The Exalead CloudView user account is used to crawl the filesystem by default. Verify that this account has the required permissions to crawl the remote filesystem. The account must be valid for the domain.
FTP Server Access
To crawl FTP or FTPS, define authentication parameters (username and password) in Advanced > Root Paths > Item n > Connectivity configuration > FTP advanced settings > Authentication. The URL format is the following: ftp://server:<port>/<path>
Note: SFTP is not supported.
HDFS Server Access
To crawl HDFS, define authentication parameters (username and password) in Advanced > Root Paths > Item n > Connectivity configuration > HDFS advanced settings > Authentication. The URL format is the following: hdfs://server:<port>/<path>
Security
The default security profile for the Files connector allows users to only see the files that they would normally see, according to the filesystem's Access Control List.
Exalead CloudView indexes the ACLs for local users and groups with the files. The security tokens that are generated depend on the filesystem platform, for example:
windows:S-1-5-21-34585....-5176
UNIX:user:42
UNIX:group:501