Programmer : CloudView Programmer : Customizing CloudView : Customizing Document Analysis
 
Customizing Document Analysis
 
What Can a Document Processor Do?
Write Custom Document Processors Inline
Add a Custom Document Processor to Your Analysis Pipeline
Exalead CloudView is delivered with a vast number of document processors that can alter documents in analysis pipelines. By assembling these processors, most analysis tasks can be performed. However, for advanced and custom operations, it is often required or more convenient to write custom document processors.
A custom document is a Java class extending the com.exalead.pdoc.analysis.CustomDocumentProcessor class. It manipulates the document as a com.exalead.pdoc.ProcessableDocument object.
Note: For functional details on document processors, see the Exalead CloudView Configuration Guide .
What Can a Document Processor Do?
Write Custom Document Processors Inline
Add a Custom Document Processor to Your Analysis Pipeline
What Can a Document Processor Do?
A document processor can:
Modify, create, or remove document metas.
Modify, create, or remove document parts.
Discard a document: ignore it, or delete it from the index.
A document processor cannot:
Modify the URI or stamp of a document.
Create new documents.
Samples
Several samples of document processors are available in the Exalead CloudView kit, in <INSTALLDIR>/sdk/java-customcode/samples/document-processors.
You can build the samples using Apache Ant. This creates a plugin zip file that you can install in Exalead CloudView.
Debugging
The process() method of the CustomDocumentProcessor receives a DocumentProcessingContext argument. Use the DocumentProcessingContext method to report any error or warning with the document. This ensures that all error context is adequately captured for efficient debugging.
Write Custom Document Processors Inline
You can write document processors directly in the Administration Console, using the integrated code editor.
1. Open the Administration Console at http://<HOSTNAME>:<BASEPORT+1>/admin.
2. Go to Index > Data Processing > Pipeline name > Document Processors.
3. Expand Custom and drag a Java Document Processor to the Current processors panel.
4. Select Inline Java, click Edit java.
5. Click Check source code to verify that the code compiles correctly.
6. Click Accept and then Apply.
Your custom document processor is now active.
Add a Custom Document Processor to Your Analysis Pipeline
Once you have developed your custom document processor, you can add it to your document analysis pipeline in the Administration Console.
Package and upload the plugin containing your document processor.
1. Open the Administration Console at http://<HOSTNAME>:<BASEPORT+1>/admin.
2. Go to Index > Data Processing > Pipeline name > Document Processors.
3. Expand Custom and drag a Custom Document Processor to the Current processors panel.
4. Fill in the Class id (available document processors are suggested automatically).
5. If there is additional configuration for the processor, you can fill in the configuration keys.