This method is used to start a new PushAPI session. This allows you to handle a "session" when working with the Push API server.
It aims to solve the following use case:
• the connector starts an indexing phase, and starts sending documents to the Push API server,
• the Indexing Server crashes or is being killed (or the server suddenly reboots); documents previously received are lost,
• the Indexing Server restarts,
• the connector sends remaining documents to the Push API server, unaware that the remote Push API server died, and the synchronization is therefore in a “lost” state.
This is done by introducing a session identifier (an integer) that identifies the remote component pushing the documents - this identifier changes each time the Exalead CloudView session manager (re)starts.
HTTP method
The get_current_session_id command allows you to get the remote Push API server session ID, which is generated when the Push API server starts. The method used is:
GET http://<host>:<port>/papi/4/connectors/<connectorName>/get_current_session_id
HTTP response
The get_current_session_id command returns the current Push API server session id (long integer ; at least 63-bit). This identifier is used internally by Push API client helpers.
API response
The API function does not return any value. It throws a PushAPISessionExistsException if a session is already opened.
void stopPushSession()
This command is used to stop a PushAPI session previously opened by startPushSession and clears the internal session id. This command has no parameters.
HTTP response
No corresponding HTTP request exists for this client function.
API response
It throws a PushAPISessionNotFoundException if no session was opened.
void addDocument(Document document) and void addDocumentList(Document[ ] documentList)
This method requests to add a document. If a document with the same URI has already been added, the document will be updated.
Note: If the conversion of a Part fails, this Part is not indexed but the other Part and the Metas are included.
Document data types
When you implement the addDocument method you must send one or more documents to be added to the index. The Document object should contain:
Types
Description
uri
A URI, which is an opaque string that uniquely identifies the document from the connector point of view.
An optional Stamp, which is an opaque string that the connector may use to track document changes. Document stamps may be retrieved through the getDocumentStatus method.
The MetaContainer of the document. Metadata are open name-value pairs.
For a complete list of metadata understood by the API, see Metadata Examples.
PartContainer
The PartContainer of the document. The Connector sends raw bytes containing the document content. Exalead CloudView conversion services will translate and extract the textual content of the document before indexing.
The Part contains a DirectiveContainer.
DirectiveContainer
The DirectiveContainer of the document (different from the directive associated to a Part).
Implement the part object
The Part object must provide accessors for the following predefined directives:
• encoding
• filename
• mimeHint
• certifiedMime
To set a custom directive, the Part object must also provide a method, for example:
public void setCustomDirective(string name, string value) public void setCustomDirective(Directive directive) public void setCustomDirective(string name, string[] values) public void addCustomDirective(string name, string value)
Implement the document object
The Document object must provide accessors for these predefined directives:
• forcedSlice
• sameSlice
And a method to set a custom directive:
public void setCustomDirective(string name, string value) public void setCustomDirective(Directive directive) public void setCustomDirective(string name, string[] values) public void addCustomDirective(string name, string value)
HTTP parameters
The add_documents parameters are described in the table below.
Important: This method sends HTTP POST requests.
Note: These parameters must be repeated (with a different id) for every document you want to send.
For better performance, we recommend using a multipart/form-data instead of application/x-www-form-urlencoded.
Parameter
Location
Description
PAPI_<id>:uri
[URL/
FORM]
The uri parameter is the string of the document URI.
PAPI_<id>:stamp
[URL/
FORM]
The optional stamp parameter is the string representing the document's Stamp.
PAPI_<id>:meta:<meta_name>
[URL/
FORM]
The meta_* parameter is a string containing the value of the metadata referenced by metaname.
Multiple values may exist for the same parameter. You must generate as many parameters as there are values.
PAPI_<id>:directive:
<directive_name>
[URL/
FORM]
The list of optional supported directives (at the document level):
• forcedSlice: advanced feature
PAPI_<id>:part_bytes:<part_name>
[URL/
FORM]
The part_bytes parameter is the content of the document's part that is identified by part_name.
PAPI_<id>:part_directive:
<part_name>:<directive_name>
[URL/
FORM]
The list of optional supported directives (at the part level):
• filename: the document filename
• mimeHint: the hint mime parameter
• mime: the forced mime (use very carefully)
• encoding: the document encoding
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
There are two update methods in the PushAPI: updateDocument(Document doc, String[] fields) and updateDocumentList(Document[] docList, String[][] fieldsList)
The first one is used to update one document, the second one to update several documents at once. The fields/ fieldsList parameters are not handled yet, so let's say they are useless as for now.
To update a document, you have to call one of these methods with a new document which:
• has the same URI as the one you want to update,
• and contains the updated parts/ metas.
The parts/ metas that do not have to be updated will be fetched from the document cache, so there is no need to put them in the document used for update.
Constraints
• For the update feature to work, you must either enable the Build Group document cache or target another Consolidation Server. For more information, see "Using Document Cache" in the Exalead CloudView Connectors Guide.
• Only documents that have been added after the document cache has been enabled will be updatable.
Notes
• The old values of multivalued metas will be dropped. If you want to update a multivalued meta by adding values, you have to put the old values you want to keep in the document used for update too.
• Remember that parts = fields and metas = fields. The index fields that will be updated depend on the part/meta field mappings, not on the part/meta names. For example, if you want to update the “text” field, you probably want to put an updated “master” part in the document used for update, and not a “text” meta.
• The document in the document cache is updated too, so subsequent updates of a document do not need to be cumulative.
• It is a good idea to perform batches of updates instead of single updates.
Document data types
When you implement the updateDocument method you must send one or more documents to be updated to the index. The Document object should contain:
Types
Description
uri
A URI, which is an opaque string that uniquely identifies the document from the connector point of view.
An optional Stamp, which is an opaque string that the connector may use to track document changes. Document stamps may be retrieved through the getDocumentStatus method.
The MetaContainer of the document. Metadata are open name-value pairs.
For a complete list of metadata understood by the API, see Metadata Examples.
PartContainer
The PartContainer of the document. The Connector sends raw bytes containing the document content. Exalead CloudView conversion services will translate and extract the textual content of the document before indexing.
The Part contains a DirectiveContainer.
DirectiveContainer
The DirectiveContainer of the document (different from the directive associated to a Part).
Implement the part object
The Part object must provide accessors for the following predefined directives:
• encoding
• filename
• mimeHint
• certifiedMime
To set a custom directive, the Part object must also provide a method, for example:
public void setCustomDirective(string name, string value) public void setCustomDirective(Directive directive) public void setCustomDirective(string name, string[] values) public void addCustomDirective(string name, string value)
Implement the document object
The Document object must provide accessors for these predefined directives:
• forcedSlice
• sameSlice
And a method to set a custom directive:
public void setCustomDirective(string name, string value) public void setCustomDirective(Directive directive) public void setCustomDirective(string name, string[] values) public void addCustomDirective(string name, string value)
HTTP parameters
The update_documents parameters are described in the table below.
Note: These parameters must be repeated (with a different id) for every document you want to send.
For better performance, we recommend using a multipart/form-data instead of application/x-www-form-urlencoded.
Parameter
Location
Description
PAPI_<id>:uri
[URL/
FORM]
The uri parameter is the string of the document URI.
PAPI_<id>:stamp
[URL/
FORM]
The optional stamp parameter is the string representing the document's Stamp.
PAPI_<id>:meta:<meta_name>
[URL/
FORM]
The meta_* parameter is a string containing the value of the metadata referenced by meta_name.
Multiple values may exist for the same parameter. You must generate as many parameters as there are values.
PAPI_<id>:directive:
<directive_name>
[URL/
FORM]
The list of optional supported directives (at the document level):
forcedSlice: advanced feature
PAPI_<id>:directive:fields
[URL/
FORM]
Not supported for the moment.
PAPI_<id>:part_bytes:<part_name>
[URL/
FORM]
The part_bytes parameter is the content of the document's part that is identified by part_name.
PAPI_<id>:part_directive:
<part_name>:<directive_name>
[URL/
FORM]
The list of optional supported directives (at the part level):
filename: the document filename
mimeHint: the hint mime parameter
mime: the forced mime (use very carefully)
encoding: the document encoding
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
Deletes a set of documents (collection) specified by a rootPath. It is possible to only delete documents at the first level of the rootPath (not recursively) by using the recursive flag.
Data types
The object contains:
Types/flag
Description
rootPath
A part of the URI used to select a subset of the corpus. If the rootPath value is an empty string ("") then the whole collection will be deleted.
Note that the rootPath means that the beginning of the URI must match.
However, no Exception or error message is returned if the rootPath refers to an empty (inexistant) subset of the corpus.
DocumentStatus getDocumentStatus(String uri) and DocumentStatus[] getDocumentStatusList(String[] uriList)
This method retrieves the status of a document within the indexed corpus specified by the URI parameters.
This status may be used by the connector to compare with the document status in the source, and then determine whether the document needs to be updated. The structure is serialized and returned in the response body.
The getDocumentStatusList method retrieves the status of a list of documents within the pushed corpus.
Data types
The DocumentStatus object contains:
Types
Description
uri
A URI is an opaque string that uniquely identifies the document from the connector point of view.
A boolean that indicates the indexing status of the document:
• true indicates that a document with the given uri has already been sent to the Indexing System. However, this does not guarantee that the document has been indexed nor that the document can be seen by the user.
• false indicates that the given uri is unknown to the Indexing System.
class DocumentStatus { String getUri(); String getStamp(); boolean isExist(); }
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/get_documents_status
HTTP parameters
The HTTP parameters are described in the table below.
Parameter
Location
Description
PAPI_uri
[URL]
The uri parameter is the string of the document URI.
To delete many files, send multiple PAPI_uri parameters.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
The setCheckpoint method sets checkpoints in the indexing system. If the optional name is specified, then the related checkpoint is changed.
Checkpoints are used when:
• The connector must process a journalized or logged data source, which can be abstractly represented as a flow of "add" and "delete" events in the corpus, and where an id can be used to refer to events on a timeline. The connector will then call the setCheckpoint command from time to time, with the id referring to the last add or delete events which have been sent to the Indexing System.
• Crash-proof synchronization is required. Upon crash, or system restart, the connector will call the getCheckpoint method to retrieve the last checkpoint saved by the Indexing System. The Indexing System guarantees that any add or delete commands called before that checkpoint were saved and will never be lost.
• To keep track of the synchronization.
The optional parameter name can be used if many checkpoints are needed for a given source. Default value is "".
The sync flag can be used to force the sync of the pending operations before returning control. Once synced, the document is pushed and securely handled by Exalead CloudView.
The setCheckpoint method returns the serial of the last pending operation before the checkpoint. It could be used to check when this document is indexed and searchable.
Note: A getCheckpoint() called immediately after a setCheckpoint() set with the sync parameter to false may not return the last value. getCheckpoint() always returns the last synced checkpoint.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/set_checkpoint
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_checkpoint
[URL]
The checkpoint parameter is the string of the checkpoint value.
PAPI_name
[URL]
This optional parameter can be used when you need to manage many checkpoints for a connector.
PAPI_sync
[URL]
The sync parameter is the string representation of the sync’s value. If true, it triggers a sync operation.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
If successful (status = OK), then the body contains the serialized form of the serial, which is the string value of the serial.
String getCheckpoint([String name])
The getCheckpoint method retrieves checkpoints in the indexing process.
The optional parameter name can be used if many checkpoints are needed for a given source. The default value is "".
A getCheckpoint() called immediately after a setCheckpoint() set with the sync parameter to false may not return the last value. getCheckpoint() always returns the last synced checkpoint.
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/get_checkpoint
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_name
[URL]
This optional parameter can be used when you need to manage many checkpoints for a connector.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
This getCheckpoint method retrieves checkpoints in the indexing process.
The name parameter corresponds to the checkpoint name. The default value is "".
If the showSynchronizedOnly parameter is set to false, you will see all checkpoints, even those that are not yet synchronized to disk. If set to true, you will see only synchronized checkpoints.
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/get_checkpoint_info
HTTP parameters
The parameters are described in the table below.
Parameter
Location
Description
PAPI_name
[URL]
This parameter is used when you need to manage many checkpoints for a connector.
PAPI_showSynchronizedOnly
[URL]
This parameter is used to specify if you want to retrieve synchronized checkpoints only.
PAPI_session
[URL]
This optional parameter retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
Opens an Iterator over the list of defined checkpoints, with a boolean parameter allowing to retrieve either synchronized checkpoints only (true) or all checkpoints (false). Iterated results are streamed and used when needed.
The default checkpoint has the name "" (empty string).
Data types
A CheckpointsInfoIterator is an abstract object used to retrieve CheckpointsInfo.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_showSynchronizedOnly
[URL]
This parameter is used to specify if you want to retrieve synchronized checkpoints only.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
GET no-cache http://<host>:<port>/papi/4/connectors/<connectorName>/enumerate_stated_checkpoints_info
• The next method returns the next CheckpointInfo of the iteration, or null if the end of the iteration has been reached.
• The nextBatch method returns the maximum number of CheckpointInfo allowed for the iteration, or less if the end of the iteration has been reached.
• The close method is used to close the iteration. The close method must be called to release resources dedicated to the iteration within the Indexing System and inside the Helper.
• The next method returns the next document of the iteration, or null if the end of the iteration has been reached.
• The nextBatch method returns the maximum number of documents allowed of the iteration, or less if the end of the iteration has been reached.
• The close method is used to close the iteration. The close method must be called to release resources dedicated to the iteration within the Indexing System and inside the Helper.
The SyncedEntry object contains:
Data types
Member
Description
uri
A URI is an opaque string that uniquely identifies the document from the connector point of view.
Opens an iterator on a document and/or folder collection matching the rootPath given as parameter. It enumerates entries that have been pushed and are in synced status. It returns a stream of entries. An entry is made of a URI and a stamp.
The underlying idea of this method is to:
• Enumerate entries in the index.
• Decode the URI to find items in the data source.
• Test whether items still exist. If all items have been removed from the datasource, then:
◦ delete the document,
◦ or decode the stamp and check whether the items have been modified in the datasource.
Iterated results are streamed and used when needed.
Data types
A SyncedEntriesIterator is an abstract object which can be used to retrieve document statuses.
The object contains:
Types/flag
Description
rootPath
A part of the URI used to select a subset of the corpus.
No Exception or error message is returned if the rootPath refers to an empty subset of the corpus. If status is OK, the body contains the string representation of the integer value.
Use of iterators with concurrent add and delete operations
• Add/Delete operations do not impact iterators that are already opened.
• Added/Deleted documents may not appear immediately in the iterated entries because of asynchronous treatment.
void sync()
The sync method can be used to flush all previous operations to disk since the last sync operation, to guarantee crash-proofness. It is a synchronous call that may take some time before returning control.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/sync
The triggerIndexingJob method can be used to trigger the indexing job.
Important: In V6R2014 and higher versions, the triggerIndexingJob() method may commit an indexing job if a document analysis has been started. Unlike, the sync() method, this method does not block the PAPI.
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/trigger_indexing_job
The areDocumentsSearchable method determines whether the documents can be found at search time. Use it with the sync method which provides the expected serial.
Note: The setCheckpoint method with the sync parameter set to true also provides the expected serial.
HTTP method
The method used is:
POST http://<host>:<port>/papi/4/connectors/<connectorName>/are_documents_searchable
HTTP parameter
The parameter is described in the table below.
Parameter
Location
Description
PAPI_serial
[URL]
The serial parameter is the string representation of the serial.
PAPI_session
[URL]
The optional parameter that retrieves the session given by a previous call to get_current_session_id
Action: if there is a session mismatch, the Push API server refuses the command and returns an exception.
notes:cn=Doe/cn=Exalead/cn=com or ~notes:cn=Doe/cn=Exalead/cn=com
file_name
String
Name of the file.
file_size
ulong
The size in bytes of the data associated to the document.
42
title
String
The title associated to the document.
To create categories in the Exalead CloudView, the Indexing System considers both the original metadata and the metadata extracted from the document content. The priority rules for metadata may be configured in the Indexing System administration interface. For example:
• The Indexing System uses both the mimeHint and filename of the document master Part, and the content type detected by an analysis of the source to generate a Top/Attributes/Kind category.
• The Indexing System uses both the language meta and the detected language from the document text to generate a Top/Attributes/Language category.