This connector allows you to run several processors on XML documents. Some processors in the pipeline can be chained, that is to say that the result of one processor is fed to another. Processors can be split into two categories:
• Regular processors that handle one document and extract data from it.
• Split processors that extract chunks of XML data from one (possibly large) document and delegate chunk processing to other processors.
This section details the processors bundled with the connector. These are all regular processors unless stated otherwise.
1. Expand PAPI document processors, and click Add item.
2. In Processor's id, specify the identifier of the processor that must process documents.
3. Click Apply.
Configuring the XML Element Processor
This processor performs a by-name node (or attribute) selection and push their values as metas. The connector selects nodes and attributes according to an include and an exclude list.
The following rules apply:
• Children of included nodes are pushed as separate metas,
• Children of excluded nodes are completely ignored (as well as their attributes).
1. Expand XML element processor, and click Add item.
2. Select Concatenate text element, to prevent the text to be split when the XML code contains predefined character entities (like &, >, ", etc.).
For example, if you index the tag: <THEME_FR>Global Procurement & Supply chain</THEME_FR> without the Concatenate text element option, you get several metadata instead of a single one. Instead of a single metadata: THEME_FR = Global Procurement & Supply chain, you get 3 metadata:
◦ THEME_FR = Global Procurement
◦ THEME_FR = &
◦ THEME_FR = Supply chain
3. For Processor’s id, specify the identifier of the processor that must process documents. This id must be identical to the Entry point id.
4. In Include elements, specify element mappings:
a. Element: Name of the XML element to extract. This can also represent an attribute when specifying a string in the form node@attribute. For example: book@title.
b. Meta: Name of meta associated with the node content. If you do not specify any meta, the connector uses the name of the element or attribute to infer the meta name. For example, if Element is set to book, the node text is pushed in the book meta, or if Element is set to book@title, the attribute text is pushed in the title meta.
5. Optionally, in Exclude elements, you can specify a list of node names that must be skipped by the processor.
6. Click Apply.
Configuring the XPath Processor
This processor associates results of XPath selection to metas.
1. Expand XPath processors, and click Add item.
2. In Processor's id, specify the identifier of the processor that must process documents. This id must be identical to the Entry point id.
3. In XPath, specify Xpath mappings:
a. XPath: XPath expression
b. Meta: Name of meta associated with the XPath results.
4. Click Apply.
Configuring the XSLT Processor
This processor performs XSL transformations on XML documents and passes the result to its following processors.
1. Expand XSLT processors, and click Add item.
2. In Stylesheet, specify the file path of the stylesheet file.
3. In Processor's id, specify the identifier of the processor that must process documents. This id must be identical to the Entry point id.
4. In Next processor’s id, specify the list of processor identifiers to run with the transformed XML.
5. Click Apply.
Configuring the Tee Processor
This processor relays processing events to a number of following processors. No data extraction is performed on the XML document.
1. Expand Tee processors, and click Add item.
2. In Processor's id, specify the identifier of the processor that must process documents. This id must be identical to the Entry point id.
3. In Following processor ids, specify the list of the next processor identifiers to run.
4. Click Apply.
Configuring the XML Attach Processor
This processor dumps the current XML document and add it as a part or a meta. You can insert this processor anywhere in the pipeline. It will relay processing events to the following processor.
1. Expand XML attach processors, and click Add item.
2. In Next processor's id, specify the identifier of the next processor in the pipeline.
3. In Part name, specify the name of the part that will include the current document.
4. In Meta name, specify the name of the meta that will include the current document.
5. In Processor's id, specify the identifier of the processor that must process documents. This id must be identical to the Entry point id.
6. Click Apply.
Configuring the Child Split Processor
This processor splits XML documents in several chunks. Every chunk is then handled as a separate document by a child processor. Every first-level child of the XML document will be extracted.
For example, if you want to index all books from the following file:
You can use a Child split processor to extract every <book/> node, and use an XPathProcessor to extract the title with the XPath expression /book/title.
1. Expand Child split processors, and click Add item.
2. In Next processor's id, specify the identifier of the next processor in the pipeline that must process chunks.
3. In Processor's id, specify the identifier of the processor that must process documents. This id must be identical to the Entry point id.
4. Click Apply.
Configuring the XPath Split Processor
This processor splits XML documents into several chunks according to a small subset of an XPath expression.
Your XPath expression must only include direct child selection, for example, /a/b/c selects the <c> node in <a><b><c></c></b></a> expression.
1. Expand Custom processors, and click Add item.
2. In XPath, specify a simple Xpath expression.
3. In Next processor's id, specify the identifier of the next processor that must process chunks.
4. In Processor's id, specify the identifier of the processor that must process documents. This id must be identical to the Entry point id.
5. Click Apply.
Configuring the Custom Processor
This processor uses user-supplied Java classes to handle XML documents.