Programmer : Connector Programmer : Push API HTTP Level : Push API Client Methods
 
Push API Client Methods
 
void ping()
void startPushSession()
void stopPushSession()
void addDocument(Document document) and void addDocumentList(Document[ ] documentList)
void updateDocument(Document document, string[] fields) and void updateDocumentList(Document[ ] documentList, string[][] fieldsList)
void deleteDocument(String uri) and void deleteDocumentList(String[] uris)
void deleteDocumentsRootPath(String rootPath [, Boolean recursive=true] )
DocumentStatus getDocumentStatus(String uri) and DocumentStatus[] getDocumentStatusList(String[] uriList)
ulong setCheckpoint(String checkpoint [, String name] [, sync=false])
String getCheckpoint([String name])
String getCheckpoint([String name, Boolean showSynchronizedOnly])
void clearAllCheckpoints()
CheckpointsInfoIterator enumerateCheckpointsInfo()
CheckpointsInfoIterator enumerateCheckpointsInfo (boolean showSynchronizedOnly)
CheckpointsInfoIterator:: next()
SyncedEntriesIterator::
SyncedEntriesIterator enumerateSyncedEntries(String rootPath, EnumerationMode enumerationMode)
ulong countSyncedEntries(String rootPath, EnumerationMode enumerationMode)
void sync()
void triggerIndexingJob()
boolean areDocumentsSearchable(long serial)
Metadata Examples
This section describes the Push API client methods (using Java conventions) to implement with the corresponding HTTP Push API POST methods.
void ping()
void startPushSession()
void stopPushSession()
void addDocument(Document document) and void addDocumentList(Document[ ] documentList)
void updateDocument(Document document, string[] fields) and void updateDocumentList(Document[ ] documentList, string[][] fieldsList)
void deleteDocument(String uri) and void deleteDocumentList(String[] uris)
void deleteDocumentsRootPath(String rootPath [, Boolean recursive=true] )
DocumentStatus getDocumentStatus(String uri) and DocumentStatus[] getDocumentStatusList(String[] uriList)
ulong setCheckpoint(String checkpoint [, String name] [, sync=false])
String getCheckpoint([String name])
String getCheckpoint([String name, Boolean showSynchronizedOnly])
void clearAllCheckpoints()
CheckpointsInfoIterator enumerateCheckpointsInfo()
CheckpointsInfoIterator enumerateCheckpointsInfo (boolean showSynchronizedOnly)
CheckpointsInfoIterator:: next()
SyncedEntriesIterator::
SyncedEntriesIterator enumerateSyncedEntries(String rootPath, EnumerationMode enumerationMode)
ulong countSyncedEntries(String rootPath, EnumerationMode enumerationMode)
void sync()
void triggerIndexingJob()
boolean areDocumentsSearchable(long serial)
Metadata Examples
void ping()
This method tests the connection with the server for the specified connectorName. This test should be called after the construction of the Push API.
The purpose of this method is to:
test the server availability
check for the existence of the connectorName and its security
compare the PAPI Versions X-Papi-Version
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
GET http://<host>:<port>/papi/4/connectors/<connectorName>/ping
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
void startPushSession()
This method is used to start a new PushAPI session. This allows you to handle a "session" when working with the Push API server.
It aims to solve the following use case:
the connector starts an indexing phase, and starts sending documents to the Push API server,
the Indexing Server crashes or is being killed (or the server suddenly reboots); documents previously received are lost,
the Indexing Server restarts,
the connector sends remaining documents to the Push API server, unaware that the remote Push API server died, and the synchronization is therefore in a “lost” state.
This is done by introducing a session identifier (an integer) that identifies the remote component pushing the documents - this identifier changes each time the Exalead CloudView session manager (re)starts.
HTTP method
The get_current_session_id command allows you to get the remote Push API server session ID, which is generated when the Push API server starts. The method used is:
GET http://<host>:<port>/papi/4/connectors/<connectorName>/get_current_session_id
HTTP response
The get_current_session_id command returns the current Push API server session id (long integer ; at least 63-bit). This identifier is used internally by Push API client helpers.
API response
The API function does not return any value. It throws a PushAPISessionExistsException if a session is already opened.
void stopPushSession()
This command is used to stop a PushAPI session previously opened by startPushSession and clears the internal session id. This command has no parameters.
HTTP response
No corresponding HTTP request exists for this client function.
API response
It throws a PushAPISessionNotFoundException if no session was opened.
void addDocument(Document document) and void addDocumentList(Document[ ] documentList)
This method requests to add a document. If a document with the same URI has already been added, the document will be updated.
Note: If the conversion of a Part fails, this Part is not indexed but the other Part and the Metas are included.
Document data types
When you implement the addDocument method you must send one or more documents to be added to the index. The Document object should contain:
Types
Description
uri
A URI, which is an opaque string that uniquely identifies the document from the connector point of view.
See also URI.
stamp
An optional Stamp, which is an opaque string that the connector may use to track document changes. Document stamps may be retrieved through the getDocumentStatus method.
See also Stamps.
MetaContainer
The MetaContainer of the document. Metadata are open name-value pairs.
For a complete list of metadata understood by the API, see Metadata Examples.
PartContainer
The PartContainer of the document. The Connector sends raw bytes containing the document content. Exalead CloudView conversion services will translate and extract the textual content of the document before indexing.
The Part contains a DirectiveContainer.
DirectiveContainer
The DirectiveContainer of the document (different from the directive associated to a Part).
Implement the part object
The Part object must provide accessors for the following predefined directives:
encoding
filename
mimeHint
certifiedMime
To set a custom directive, the Part object must also provide a method, for example:
public void setCustomDirective(string name, string value)
public void setCustomDirective(Directive directive)
public void setCustomDirective(string name, string[] values)
public void addCustomDirective(string name, string value)
Implement the document object
The Document object must provide accessors for these predefined directives:
forcedSlice
sameSlice
And a method to set a custom directive:

public void setCustomDirective(string name, string value)
public void setCustomDirective(Directive directive)
public void setCustomDirective(string name, string[] values)
public void addCustomDirective(string name, string value)
HTTP parameters
The add_documents parameters are described in the table below.
Important: This method sends HTTP POST requests.
Note: These parameters must be repeated (with a different id) for every document you want to send.
For better performance, we recommend using a multipart/form-data instead of application/x-www-form-urlencoded.
Parameter
Location
Description
PAPI_<id>:uri
[URL/
FORM]
The uri parameter is the string of the document URI.
PAPI_<id>:stamp
[URL/
FORM]
The optional stamp parameter is the string representing the document's Stamp.
PAPI_<id>:meta:<meta_name>
[URL/
FORM]
The meta_* parameter is a string containing the value of the metadata referenced by metaname.
Multiple values may exist for the same parameter. You must generate as many parameters as there are values.
PAPI_<id>:directive:
<directive_name>
[URL/
FORM]
The list of optional supported directives (at the document level):
forcedSlice: advanced feature
PAPI_<id>:part_bytes:<part_name>
[URL/
FORM]
The part_bytes parameter is the content of the document's part that is identified by part_name.
PAPI_<id>:part_directive:
<part_name>:<directive_name>
[URL/
FORM]
The list of optional supported directives (at the part level):
filename: the document filename
mimeHint: the hint mime parameter
mime: the forced mime (use very carefully)
encoding: the document encoding
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
void updateDocument(Document document, string[] fields) and void updateDocumentList(Document[ ] documentList, string[][] fieldsList)
There are two update methods in the PushAPI: updateDocument(Document doc, String[] fields) and updateDocumentList(Document[] docList, String[][] fieldsList)
The first one is used to update one document, the second one to update several documents at once. The fields/ fieldsList parameters are not handled yet, so let's say they are useless as for now.
To update a document, you have to call one of these methods with a new document which:
has the same URI as the one you want to update,
and contains the updated parts/ metas.
The parts/ metas that do not have to be updated will be fetched from the document cache, so there is no need to put them in the document used for update.
Constraints
For the update feature to work, you must either enable the Build Group document cache or target another Consolidation Server. For more information, see "Using Document Cache" in the Exalead CloudView Connectors Guide.
Only documents that have been added after the document cache has been enabled will be updatable.
Notes
The old values of multivalued metas will be dropped. If you want to update a multivalued meta by adding values, you have to put the old values you want to keep in the document used for update too.
Remember that parts = fields and metas = fields. The index fields that will be updated depend on the part/meta field mappings, not on the part/meta names. For example, if you want to update the “text” field, you probably want to put an updated “master” part in the document used for update, and not a “text” meta.
The document in the document cache is updated too, so subsequent updates of a document do not need to be cumulative.
It is a good idea to perform batches of updates instead of single updates.
Document data types
When you implement the updateDocument method you must send one or more documents to be updated to the index. The Document object should contain:
Types
Description
uri
A URI, which is an opaque string that uniquely identifies the document from the connector point of view.
See also URI.
stamp
An optional Stamp, which is an opaque string that the connector may use to track document changes. Document stamps may be retrieved through the getDocumentStatus method.
See also Stamps.
MetaContainer
The MetaContainer of the document. Metadata are open name-value pairs.
For a complete list of metadata understood by the API, see Metadata Examples.
PartContainer
The PartContainer of the document. The Connector sends raw bytes containing the document content. Exalead CloudView conversion services will translate and extract the textual content of the document before indexing.
The Part contains a DirectiveContainer.
DirectiveContainer
The DirectiveContainer of the document (different from the directive associated to a Part).
Implement the part object
The Part object must provide accessors for the following predefined directives:
encoding
filename
mimeHint
certifiedMime
To set a custom directive, the Part object must also provide a method, for example:
public void setCustomDirective(string name, string value)
public void setCustomDirective(Directive directive)
public void setCustomDirective(string name, string[] values)
public void addCustomDirective(string name, string value)
Implement the document object
The Document object must provide accessors for these predefined directives:
forcedSlice
sameSlice
And a method to set a custom directive:
public void setCustomDirective(string name, string value)
public void setCustomDirective(Directive directive)
public void setCustomDirective(string name, string[] values)
public void addCustomDirective(string name, string value)
HTTP parameters
The update_documents parameters are described in the table below.
Note: These parameters must be repeated (with a different id) for every document you want to send.
For better performance, we recommend using a multipart/form-data instead of application/x-www-form-urlencoded.
Parameter
Location
Description
PAPI_<id>:uri
[URL/
FORM]
The uri parameter is the string of the document URI.
PAPI_<id>:stamp
[URL/
FORM]
The optional stamp parameter is the string representing the document's Stamp.
PAPI_<id>:meta:<meta_name>
[URL/
FORM]
The meta_* parameter is a string containing the value of the metadata referenced by meta_name.
Multiple values may exist for the same parameter. You must generate as many parameters as there are values.
PAPI_<id>:directive:
<directive_name>
[URL/
FORM]
The list of optional supported directives (at the document level):
forcedSlice: advanced feature
PAPI_<id>:directive:fields
[URL/
FORM]
Not supported for the moment.
PAPI_<id>:part_bytes:<part_name>
[URL/
FORM]
The part_bytes parameter is the content of the document's part that is identified by part_name.
PAPI_<id>:part_directive:
<part_name>:<directive_name>
[URL/
FORM]
The list of optional supported directives (at the part level):
filename: the document filename
mimeHint: the hint mime parameter
mime: the forced mime (use very carefully)
encoding: the document encoding
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
void deleteDocument(String uri) and void deleteDocumentList(String[] uris)
Request to delete a document on the specified URI list.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/delete_documents
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_uri
[URL]
The uri parameter is the string of the document URI.
To delete many files, send multiple PAPI_uri parameters.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
However, no exception or error message is reported if the URI is unknown or refers to a document that was already deleted.
void deleteDocumentsRootPath(String rootPath [, Boolean recursive=true] )
Deletes a set of documents (collection) specified by a rootPath. It is possible to only delete documents at the first level of the rootPath (not recursively) by using the recursive flag.
Data types
The object contains:
Types/flag
Description
rootPath
A part of the URI used to select a subset of the corpus. If the rootPath value is an empty string ("") then the whole collection will be deleted.
Note that the rootPath means that the beginning of the URI must match.
See also URI.
recursive
The recursive flag indicates that the deletion also impacts subfolders.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/delete_documents_root_path
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_rootPath
[URL]
The rootPath parameter is the string representation of the rootPath. It can take the form:
/root/subdir1/subdir2/subdir3/subdir3/...
PAPI_recursive
[URL]
A boolean representation of the flag: 'true' for true, 'false' for false.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
However, no Exception or error message is returned if the rootPath refers to an empty (inexistant) subset of the corpus.
DocumentStatus getDocumentStatus(String uri) and DocumentStatus[] getDocumentStatusList(String[] uriList)
This method retrieves the status of a document within the indexed corpus specified by the URI parameters.
This status may be used by the connector to compare with the document status in the source, and then determine whether the document needs to be updated. The structure is serialized and returned in the response body.
The getDocumentStatusList method retrieves the status of a list of documents within the pushed corpus.
Data types
The DocumentStatus object contains:
Types
Description
uri
A URI is an opaque string that uniquely identifies the document from the connector point of view.
See also URI.
stamp
An optional Stamp.
See also Stamps.
exist
A boolean that indicates the indexing status of the document:
true indicates that a document with the given uri has already been sent to the Indexing System. However, this does not guarantee that the document has been indexed nor that the document can be seen by the user.
false indicates that the given uri is unknown to the Indexing System.
class DocumentStatus
{
String getUri();
String getStamp();
boolean isExist();
}
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/get_documents_status
HTTP parameters
The HTTP parameters are described in the table below.
Parameter
Location
Description
PAPI_uri
[URL]
The uri parameter is the string of the document URI.
To delete many files, send multiple PAPI_uri parameters.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
If successful (status = OK), then the body contains the serialized form of the DocumentStatus in XML format.
Here is the response format for each entry:
[M/D] [space] [url_encode(URI)] [space] [escape(STAMP)] [\n]
\ only if document is existing /
Where:
url_encode() – is a function which performs an url encoding of the given value.
escape() – is a function which replaces \r and \n with \\r and \\n.
M/DM indicates a missing entry, D indicates an existing document.
ulong setCheckpoint(String checkpoint [, String name] [, sync=false])
The setCheckpoint method sets checkpoints in the indexing system. If the optional name is specified, then the related checkpoint is changed.
Checkpoints are used when:
The connector must process a journalized or logged data source, which can be abstractly represented as a flow of "add" and "delete" events in the corpus, and where an id can be used to refer to events on a timeline. The connector will then call the setCheckpoint command from time to time, with the id referring to the last add or delete events which have been sent to the Indexing System.
Crash-proof synchronization is required. Upon crash, or system restart, the connector will call the getCheckpoint method to retrieve the last checkpoint saved by the Indexing System. The Indexing System guarantees that any add or delete commands called before that checkpoint were saved and will never be lost.
To keep track of the synchronization.
The optional parameter name can be used if many checkpoints are needed for a given source. Default value is "".
The sync flag can be used to force the sync of the pending operations before returning control. Once synced, the document is pushed and securely handled by Exalead CloudView.
The setCheckpoint method returns the serial of the last pending operation before the checkpoint. It could be used to check when this document is indexed and searchable.
Note: A getCheckpoint() called immediately after a setCheckpoint() set with the sync parameter to false may not return the last value. getCheckpoint() always returns the last synced checkpoint.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/set_checkpoint
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_checkpoint
[URL]
The checkpoint parameter is the string of the checkpoint value.
PAPI_name
[URL]
This optional parameter can be used when you need to manage many checkpoints for a connector.
PAPI_sync
[URL]
The sync parameter is the string representation of the sync’s value. If true, it triggers a sync operation.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard responses. See HTTP command response.
If successful (status = OK), then the body contains the serialized form of the serial, which is the string value of the serial.
String getCheckpoint([String name])
The getCheckpoint method retrieves checkpoints in the indexing process.
The optional parameter name can be used if many checkpoints are needed for a given source. The default value is "".
A getCheckpoint() called immediately after a setCheckpoint() set with the sync parameter to false may not return the last value. getCheckpoint() always returns the last synced checkpoint.
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/get_checkpoint
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_name
[URL]
This optional parameter can be used when you need to manage many checkpoints for a connector.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
If successful (status = OK), then the body contains the serialized form of the checkpoint, which is the string value of the checkpoint.
String getCheckpoint([String name, Boolean showSynchronizedOnly])
This getCheckpoint method retrieves checkpoints in the indexing process.
The name parameter corresponds to the checkpoint name. The default value is "".
If the showSynchronizedOnly parameter is set to false, you will see all checkpoints, even those that are not yet synchronized to disk. If set to true, you will see only synchronized checkpoints.
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/get_checkpoint_info
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_name
[URL]
This parameter is used when you need to manage many checkpoints for a connector.
PAPI_showSynchronizedOnly
[URL]
This parameter is used to specify if you want to retrieve synchronized checkpoints only.
PAPI_session
[URL]
This optional parameter retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
If successful (status = OK), then the body contains the serialized form of the checkpoint, which is the string value of the checkpoint.
void clearAllCheckpoints()
The clearAllCheckpoints method is used to reset all checkpoints values, including the checkpoints with optional names.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/clear_all_checkpoints
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
CheckpointsInfoIterator enumerateCheckpointsInfo()
Opens an Iterator over the list of defined checkpoints. Iterated results are streamed and used when needed.
The default checkpoint has the name "" (empty string).
Data types
A CheckpointsInfoIterator is an abstract object used to retrieve CheckpointsInfo.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/enumerate_checkpoints_info
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
Here is the response format for each entry:
[url_encode(NAME)] [space] [escape(VALUE)] [\n]
Where:
url_encode() – is a function which performs an url encoding of the given value.
escape() – is a function which replaces \r and \n with \\r and \\n.
NAME – can be empty.
CheckpointsInfoIterator enumerateCheckpointsInfo (boolean showSynchronizedOnly)
Opens an Iterator over the list of defined checkpoints, with a boolean parameter allowing to retrieve either synchronized checkpoints only (true) or all checkpoints (false). Iterated results are streamed and used when needed.
The default checkpoint has the name "" (empty string).
Data types
A CheckpointsInfoIterator is an abstract object used to retrieve CheckpointsInfo.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_showSynchronizedOnly
[URL]
This parameter is used to specify if you want to retrieve synchronized checkpoints only.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/enumerate_stated_checkpoints_info
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
Here is the response format for each entry:
[url_encode(NAME)] [space] [escape(VALUE)] [\n]
Where:
url_encode() – is a function which performs an url encoding of the given value.
escape() – is a function which replaces \r and \n with \\r and \\n.
NAME – can be empty.
CheckpointsInfoIterator:: next()
This section describes the CheckpointsInfoIterator method.
The methods for the CheckpointsInfoIterator are the following:
CheckpointsInfoIterator::
CheckpointInfo next()
CheckpointInfo[] nextBatch(int count)
void close()
Where:
The next method returns the next CheckpointInfo of the iteration, or null if the end of the iteration has been reached.
The nextBatch method returns the maximum number of CheckpointInfo allowed for the iteration, or less if the end of the iteration has been reached.
The close method is used to close the iteration. The close method must be called to release resources dedicated to the iteration within the Indexing System and inside the Helper.
The command uses the standard HTTP responses. See HTTP command response.
SyncedEntriesIterator::
The methods for the SyncedEntriesIterator are the following:
SyncedEntriesIterator::
SyncedEntry next()
SyncedEntry[] nextBatch(int count)
void close()
The next method returns the next document of the iteration, or null if the end of the iteration has been reached.
The nextBatch method returns the maximum number of documents allowed of the iteration, or less if the end of the iteration has been reached.
The close method is used to close the iteration. The close method must be called to release resources dedicated to the iteration within the Indexing System and inside the Helper.
The SyncedEntry object contains:
Data types
Member
Description
uri
A URI is an opaque string that uniquely identifies the document from the connector point of view.
See also URI.
stamp
See Stamps.
isFolder
A boolean that is true if the entry refers to a directory, false otherwise.
class SyncedEntry
{
String getUri()
String getStamp()
bool isFolder()
}
SyncedEntriesIterator enumerateSyncedEntries(String rootPath, EnumerationMode enumerationMode)
Opens an iterator on a document and/or folder collection matching the rootPath given as parameter. It enumerates entries that have been pushed and are in synced status. It returns a stream of entries. An entry is made of a URI and a stamp.
The underlying idea of this method is to:
Enumerate entries in the index.
Decode the URI to find items in the data source.
Test whether items still exist. If all items have been removed from the datasource, then:
delete the document,
or decode the stamp and check whether the items have been modified in the datasource.
Iterated results are streamed and used when needed.
Data types
A SyncedEntriesIterator is an abstract object which can be used to retrieve document statuses.
The object contains:
Types/flag
Description
rootPath
A part of the URI used to select a subset of the corpus.
See also URI.
enumerationMode
The EnumerationMode lists the available types.
For example, NOT_RECURSIVE_ALL returns the subfolders and the documents in the rootPath.
Similarly, RECURSIVE_DOCUMENTS returns all the documents in the rootPath (but not the subfolders).
enum EnumerationMode
{
NOT_RECURSIVE_FOLDERS,
NOT_RECURSIVE_DOCUMENTS,
NOT_RECURSIVE_ALL,
RECURSIVE_DOCUMENTS
}
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/enumerate_synced_entries
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_rootPath
[URL]
The rootPath parameter is the string representation of the rootPath. It can take the form:
/root/subdir1/subdir2/subdir3/subdir3/...
PAPI_mode
[URL]
The mode parameter is the string representation of the mode:
NOT_RECURSIVE_FOLDERS
NOT_RECURSIVE_DOCUMENTS
NOT_RECURSIVE_ALL
RECURSIVE_DOCUMENTS
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
Here is the response format for each entry:
[D/F] [space] [url_encode(URI)] [space] [escape(STAMP)] [\n]
Where
url_encode() – is a function which performs an url encoding of the given value.
escape() – is a function which replaces \r and \n with \\r and \\n.
D/FD indicates an existing document, F indicates a folder.
Use of iterators with concurrent add and delete operations
Add/Delete operations do not impact iterators that are already opened.
Added/Deleted documents may not appear immediately in the iterated entries because of asynchronous treatment.
ulong countSyncedEntries(String rootPath, EnumerationMode enumerationMode)
Opens an iterator on a document and/or folder collection matching the rootPath given as a parameter, but only returns the number of items found.
Therefore, it counts the number of entries in the whole or in a subset of the Indexing corpus for that Connector.
Data types
The object contains:
Types
Description
rootPath
enumerationMode
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/count_synced_entries
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_rootPath
[URL]
The rootPath parameter is the string representation of the rootPath. It can take the form:
/root/subdir1/subdir2/subdir3/subdir3/...
PAPI_mode
[URL]
The mode parameter is the string representation of the mode:
NOT_RECURSIVE_FOLDERS
NOT_RECURSIVE_DOCUMENTS
NOT_RECURSIVE_ALL
RECURSIVE_DOCUMENTS
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
No Exception or error message is returned if the rootPath refers to an empty subset of the corpus. If status is OK, the body contains the string representation of the integer value.
Use of iterators with concurrent add and delete operations
Add/Delete operations do not impact iterators that are already opened.
Added/Deleted documents may not appear immediately in the iterated entries because of asynchronous treatment.
void sync()
The sync method can be used to flush all previous operations to disk since the last sync operation, to guarantee crash-proofness. It is a synchronous call that may take some time before returning control.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/sync
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
void triggerIndexingJob()
The triggerIndexingJob method can be used to trigger the indexing job.
Important: In V6R2014 and higher versions, the triggerIndexingJob() method may commit an indexing job if a document analysis has been started. Unlike, the sync() method, this method does not block the PAPI.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/trigger_indexing_job
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
boolean areDocumentsSearchable(long serial)
The areDocumentsSearchable method determines whether the documents can be found at search time. Use it with the sync method which provides the expected serial.
Note: The setCheckpoint method with the sync parameter set to true also provides the expected serial.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/are_documents_searchable
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_serial
[URL]
The serial parameter is the string representation of the serial.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP response
The command uses the standard HTTP responses. See HTTP command response.
If status is OK, the body contains the string representation of the boolean value (true or false).
Metadata Examples
The following table contains the Metadata name-value pairs that should be understood by the addDocument method.
Name
Format
Description
Example
lastmodifieddate
RFC 822 and RFC 2822 formats ("RFC Date Format"), that is the common format in most Internet protocols (Mail, HTTP, ..)
ISO 8601 and RFC 3339 formats
Unix date and time (English format)
The date to be associated with the document.
1977/07/18-11:50:36 (GMT)1980/09/14
publicUrl
URL
The public URL of the resource.
http://server /getDoc.php?id=24
author
Displayed string
Author name
John Doe
mail:from
See RFC 822
The sender of the document.
John Doe <doe@doe.net>
mail:to
See RFC 822
mail:cc
See RFC 822
mail:bcc
See RFC 822
language
ISO 639
Primary or secondary level language tag
fr-FR
en
ar-AR
security
[~] PROVIDER:TOKEN
Known providers:
windows
notes
unix
Note: The prefix ~ can be used for specifying a negative security token
You must add the special security token declaring that the document is public:
com.exalead.papi.helper.SecurityMeta.PUBLIC_SECURITY_TOKEN
windows:S-1-5-21-3495842611 -1063732614-555398628-5176
or
~windows:S-1-5-21-3495842611 -1063732614-555398628-5176
notes:cn=Doe/cn=Exalead/cn=com or ~notes:cn=Doe/cn=Exalead/cn=com
file_name
String
Name of the file.
file_size
ulong
The size in bytes of the data associated to the document.
42
title
String
The title associated to the document.
To create categories in the Exalead CloudView, the Indexing System considers both the original metadata and the metadata extracted from the document content. The priority rules for metadata may be configured in the Indexing System administration interface. For example:
The Indexing System uses both the mimeHint and filename of the document master Part, and the content type detected by an analysis of the source to generate a Top/Attributes/Kind category.
The Indexing System uses both the language meta and the detected language from the document text to generate a Top/Attributes/Language category.